The ROW_NUMBER()
function in MySQL is a powerful tool that assigns a unique sequential integer to rows within a result set, based on the specified partition and order. This article delves into practical applications of ROW_NUMBER()
, using it to solve real-world SQL challenges.
Understanding Window Functions and ROW_NUMBER()
A window function in SQL is a type of function that performs a calculation across a set of table rows that are somehow related to the current row. This concept is distinguished from regular SQL functions because window functions do not cause the rows to become grouped into a single output row. This allows you to perform operations on a set of rows that are related to the current row and return multiple rows as a result, unlike aggregate functions.
The keyword OVER
is used in conjunction with window functions to define the window or set of rows that the function operates on. Within this clause, you can specify how the rows are partitioned and ordered using the PARTITION BY
and ORDER BY
keywords, respectively.
ROW_NUMBER()
: This function returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.OVER
Keyword: It must accompany every window function and defines the window or set of rows to which the function applies.PARTITION BY
: This clause divides the result set into partitions, to which theROW_NUMBER()
function is applied independently.ORDER BY
: Specifies the order of rows within each partition.
These elements come together in SQL syntax like this:
ROW_NUMBER() OVER (PARTITION BY column_list ORDER BY column_list)
In simpler terms, window functions allow you to match each row with a set of rows and compute values based on them. ROW_NUMBER()
specifically provides a way to number these rows sequentially. This is extremely useful for tasks like ranking or pagination, where understanding the specific order or position of a row in a dataset is necessary.
Setting Up the employees
Table
Before diving into the examples, let's create an employees
table populated with sample data to demonstrate the capabilities of ROW_NUMBER()
. This will help visualize each function's application.
CREATE TABLE employees (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2),
country VARCHAR(50),
gender VARCHAR(10)
);
INSERT INTO employees (name, department, salary, country, gender) VALUES
('Alice', 'Engineering', 85000, 'USA', 'Female'),
('Bob', 'Engineering', 95000, 'UK', 'Male'),
('Charlie', 'Marketing', 67000, 'USA', 'Male'),
('David', 'Marketing', 65000, 'UK', 'Male'),
('Eve', 'Sales', 62000, 'USA', 'Female'),
('Frank', 'Sales', 61000, 'USA', 'Male'),
('Grace', 'Engineering', 78000, 'Canada', 'Female'),
('Hank', 'Engineering', 82000, 'Canada', 'Male'),
('Ivy', 'Engineering', 61000, 'USA', 'Female'),
('Jack', 'Marketing', 68000, 'USA', 'Male'),
('Ken', 'Sales', 72000, 'UK', 'Male'),
('Liam', 'Sales', 71000, 'USA', 'Male'),
('Mona', 'Product', 93000, 'Canada', 'Female'),
('Nina', 'Product', 88000, 'UK', 'Female'),
('Oscar', 'Engineering', 79000, 'USA', 'Male'),
('Paul', 'Marketing', 75000, 'Canada', 'Male'),
('Quinn', 'Sales', 73000, 'Canada', 'Female'),
('Rita', 'Product', 94000, 'USA', 'Female'),
('Sam', 'Product', 92000, 'UK', 'Male'),
('Tom', 'Product', 88000, 'Canada', 'Male');
This table features 20 employees across four departments, with varying salaries, countries, and genders.
Rank Employees by Salary Within Each Department
In scenarios where you need to rank employees according to their salary within each department, ROW_NUMBER()
offers a straightforward solution:
SELECT
department,
name,
salary,
country,
gender,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;
Output Explanation:
This query ranks employees based on their salary in descending order within each department. For example, the highest-paid employee in 'Engineering' receives a rank of 1 and so on:
department | name | salary | country | gender | salary_rank |
---|---|---|---|---|---|
Engineering | Bob | 95000.00 | UK | Male | 1 |
Engineering | Alice | 85000.00 | USA | Female | 2 |
Engineering | Hank | 82000.00 | Canada | Male | 3 |
Engineering | Oscar | 79000.00 | USA | Male | 4 |
Engineering | Grace | 78000.00 | Canada | Female | 5 |
Engineering | Ivy | 61000.00 | USA | Female | 6 |
Marketing | Paul | 75000.00 | Canada | Male | 1 |
Marketing | Jack | 68000.00 | USA | Male | 2 |
Marketing | Charlie | 67000.00 | USA | Male | 3 |
Marketing | David | 65000.00 | UK | Male | 4 |
Product | Rita | 94000.00 | USA | Female | 1 |
Product | Mona | 93000.00 | Canada | Female | 2 |
Product | Sam | 92000.00 | UK | Male | 3 |
Product | Tom | 88000.00 | Canada | Male | 4 |
Product | Nina | 88000.00 | UK | Female | 5 |
Sales | Quinn | 73000.00 | Canada | Female | 1 |
Sales | Ken | 72000.00 | UK | Male | 2 |
Sales | Liam | 71000.00 | USA | Male | 3 |
Sales | Eve | 62000.00 | USA | Female | 4 |
Sales | Frank | 61000.00 | USA | Male | 5 |
Get Top N (e.g., Top 2) Highest Paid Employees per Department
To extract a specific number of top-paid employees per department, use a common table expression (CTE) with ROW_NUMBER()
:
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT * FROM ranked WHERE rnk <= 2;
Output Explanation:
This query returns the top two employees in salary per department:
id | name | department | salary | country | gender | rnk |
---|---|---|---|---|---|---|
2 | Bob | Engineering | 95000.00 | UK | Male | 1 |
1 | Alice | Engineering | 85000.00 | USA | Female | 2 |
16 | Paul | Marketing | 75000.00 | Canada | Male | 1 |
10 | Jack | Marketing | 68000.00 | USA | Male | 2 |
18 | Rita | Product | 94000.00 | USA | Female | 1 |
13 | Mona | Product | 93000.00 | Canada | Female | 2 |
17 | Quinn | Sales | 73000.00 | Canada | Female | 1 |
11 | Ken | Sales | 72000.00 | UK | Male | 2 |
Get Top N (e.g., Top 2) Lowest Paid Employees per Country
Similarly, to find the top lowest paid employees per country, modify the partitioning and ordering logic:
WITH ranked_lowest AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY country ORDER BY salary ASC) AS rnk
FROM employees
)
SELECT * FROM ranked_lowest WHERE rnk <= 2;
Output Explanation:
This query fetches the top two lowest-paid employees in salary per country:
id | name | department | salary | country | gender | rnk |
---|---|---|---|---|---|---|
17 | Quinn | Sales | 73000.00 | Canada | Female | 1 |
16 | Paul | Marketing | 75000.00 | Canada | Male | 2 |
4 | David | Marketing | 65000.00 | UK | Male | 1 |
11 | Ken | Sales | 72000.00 | UK | Male | 2 |
9 | Ivy | Engineering | 61000.00 | USA | Female | 1 |
6 | Frank | Sales | 61000.00 | USA | Male | 2 |
Paginate Results with Stable Ordering
For applications where data needs to be paginated, ROW_NUMBER()
helps establish stable stateful pagination:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY department, name) AS rn
FROM employees
) AS numbered
WHERE rn BETWEEN 6 AND 10;
Output Explanation:
This query selects rows 6 through 10 based on alphabetical ordering by department and name:
id | name | department | salary | country | gender | rn |
---|---|---|---|---|---|---|
15 | Oscar | Engineering | 79000.00 | USA | Male | 6 |
3 | Charlie | Marketing | 67000.00 | USA | Male | 7 |
4 | David | Marketing | 65000.00 | UK | Male | 8 |
10 | Jack | Marketing | 68000.00 | USA | Male | 9 |
16 | Paul | Marketing | 75000.00 | Canada | Male | 10 |
To demonstrate pagination for specific demographics, such as filtering for female employees, the following query is used to arrange and paginate these records. It orders them alphabetically by department and name, showing rows 1 through 5:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY department, name) AS rn
FROM employees
WHERE gender = 'Female'
) AS numbered
WHERE rn BETWEEN 1 AND 5;
Output Explanation:
This query filters and paginates female employees, organizing them in alphabetical order by department and name, then selecting rows 1 through 5:
id | name | department | salary | country | gender | rn |
---|---|---|---|---|---|---|
1 | Alice | Engineering | 85000.00 | USA | Female | 1 |
7 | Grace | Engineering | 78000.00 | Canada | Female | 2 |
9 | Ivy | Engineering | 61000.00 | USA | Female | 3 |
13 | Mona | Product | 93000.00 | Canada | Female | 4 |
14 | Nina | Product | 88000.00 | UK | Female | 5 |
Detect First Record in Each Group
ROW_NUMBER()
proves invaluable for identifying the first entry in grouped records:
WITH first_in_group AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY id) AS rnk
FROM employees
)
SELECT * FROM first_in_group WHERE rnk = 1;
Output Explanation:
This query fetches the first entry for each department based on their id
:
id | name | department | salary | country | gender | rnk |
---|---|---|---|---|---|---|
1 | Alice | Engineering | 85000.00 | USA | Female | 1 |
3 | Charlie | Marketing | 67000.00 | USA | Male | 1 |
13 | Mona | Product | 93000.00 | Canada | Female | 1 |
5 | Eve | Sales | 62000.00 | USA | Female | 1 |
Another detailed and useful application is to identify the highest-paid employee within each country, ensuring that the salary is considered in descending order to spot the top earner efficiently:
WITH top_earner AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY country ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT * FROM top_earner WHERE rnk = 1;
Output Explanation:
This query identifies the highest-paid employee in each country:
id | name | department | salary | country | gender | rnk |
---|---|---|---|---|---|---|
13 | Mona | Product | 93000.00 | Canada | Female | 1 |
2 | Bob | Engineering | 95000.00 | UK | Male | 1 |
18 | Rita | Product | 94000.00 | USA | Female | 1 |
Conclusion
By mastering the ROW_NUMBER()
function, you gain powerful tools for managing data queries that involve ranking, pagination, deduplication, and more. Employing these techniques translates to more robust and efficient database management, allowing for sophisticated data manipulation and retrieval.