Exploring the MySQL `ROW_NUMBER` Window Function

The ROW_NUMBER() function in MySQL is a powerful tool that assigns a unique sequential integer to rows within a result set, based on the specified partition and order. This article delves into practical applications of ROW_NUMBER(), using it to solve real-world SQL challenges.

Understanding Window Functions and `ROW_NUMBER()`

A window function in SQL is a type of function that performs a calculation across a set of table rows that are somehow related to the current row. This concept is distinguished from regular SQL functions because window functions do not cause the rows to become grouped into a single output row. This allows you to perform operations on a set of rows that are related to the current row and return multiple rows as a result, unlike aggregate functions.

The keyword OVER is used in conjunction with window functions to define the window or set of rows that the function operates on. Within this clause, you can specify how the rows are partitioned and ordered using the PARTITION BY and ORDER BY keywords, respectively.

ROW_NUMBER(): This function returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
OVER Keyword: It must accompany every window function and defines the window or set of rows to which the function applies.
PARTITION BY: This clause divides the result set into partitions, to which the ROW_NUMBER() function is applied independently.
ORDER BY: Specifies the order of rows within each partition.

These elements come together in SQL syntax like this:

ROW_NUMBER() OVER (PARTITION BY column_list ORDER BY column_list)

In simpler terms, window functions allow you to match each row with a set of rows and compute values based on them. ROW_NUMBER() specifically provides a way to number these rows sequentially. This is extremely useful for tasks like ranking or pagination, where understanding the specific order or position of a row in a dataset is necessary.

Setting Up the `employees` Table

Before diving into the examples, let's create an employees table populated with sample data to demonstrate the capabilities of ROW_NUMBER(). This will help visualize each function's application.

CREATE TABLE employees (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(100),
  department VARCHAR(50),
  salary DECIMAL(10, 2),
  country VARCHAR(50),
  gender VARCHAR(10)
);

INSERT INTO employees (name, department, salary, country, gender) VALUES
('Alice', 'Engineering', 85000, 'USA', 'Female'),
('Bob', 'Engineering', 95000, 'UK', 'Male'),
('Charlie', 'Marketing', 67000, 'USA', 'Male'),
('David', 'Marketing', 65000, 'UK', 'Male'),
('Eve', 'Sales', 62000, 'USA', 'Female'),
('Frank', 'Sales', 61000, 'USA', 'Male'),
('Grace', 'Engineering', 78000, 'Canada', 'Female'),
('Hank', 'Engineering', 82000, 'Canada', 'Male'),
('Ivy', 'Engineering', 61000, 'USA', 'Female'),
('Jack', 'Marketing', 68000, 'USA', 'Male'),
('Ken', 'Sales', 72000, 'UK', 'Male'),
('Liam', 'Sales', 71000, 'USA', 'Male'),
('Mona', 'Product', 93000, 'Canada', 'Female'),
('Nina', 'Product', 88000, 'UK', 'Female'),
('Oscar', 'Engineering', 79000, 'USA', 'Male'),
('Paul', 'Marketing', 75000, 'Canada', 'Male'),
('Quinn', 'Sales', 73000, 'Canada', 'Female'),
('Rita', 'Product', 94000, 'USA', 'Female'),
('Sam', 'Product', 92000, 'UK', 'Male'),
('Tom', 'Product', 88000, 'Canada', 'Male');

This table features 20 employees across four departments, with varying salaries, countries, and genders.

Rank Employees by Salary Within Each Department

In scenarios where you need to rank employees according to their salary within each department, ROW_NUMBER() offers a straightforward solution:

SELECT 
  department,
  name,
  salary,
  country,
  gender,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;

Output Explanation:

This query ranks employees based on their salary in descending order within each department. For example, the highest-paid employee in 'Engineering' receives a rank of 1 and so on:

department	name	salary	country	gender	salary_rank
Engineering	Bob	95000.00	UK	Male	1
Engineering	Alice	85000.00	USA	Female	2
Engineering	Hank	82000.00	Canada	Male	3
Engineering	Oscar	79000.00	USA	Male	4
Engineering	Grace	78000.00	Canada	Female	5
Engineering	Ivy	61000.00	USA	Female	6
Marketing	Paul	75000.00	Canada	Male	1
Marketing	Jack	68000.00	USA	Male	2
Marketing	Charlie	67000.00	USA	Male	3
Marketing	David	65000.00	UK	Male	4
Product	Rita	94000.00	USA	Female	1
Product	Mona	93000.00	Canada	Female	2
Product	Sam	92000.00	UK	Male	3
Product	Tom	88000.00	Canada	Male	4
Product	Nina	88000.00	UK	Female	5
Sales	Quinn	73000.00	Canada	Female	1
Sales	Ken	72000.00	UK	Male	2
Sales	Liam	71000.00	USA	Male	3
Sales	Eve	62000.00	USA	Female	4
Sales	Frank	61000.00	USA	Male	5

Get Top N (e.g., Top 2) Highest Paid Employees per Department

To extract a specific number of top-paid employees per department, use a common table expression (CTE) with ROW_NUMBER():

WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT * FROM ranked WHERE rnk <= 2;

Output Explanation:

This query returns the top two employees in salary per department:

id	name	department	salary	country	gender	rnk
2	Bob	Engineering	95000.00	UK	Male	1
1	Alice	Engineering	85000.00	USA	Female	2
16	Paul	Marketing	75000.00	Canada	Male	1
10	Jack	Marketing	68000.00	USA	Male	2
18	Rita	Product	94000.00	USA	Female	1
13	Mona	Product	93000.00	Canada	Female	2
17	Quinn	Sales	73000.00	Canada	Female	1
11	Ken	Sales	72000.00	UK	Male	2

Get Top N (e.g., Top 2) Lowest Paid Employees per Country

Similarly, to find the top lowest paid employees per country, modify the partitioning and ordering logic:

WITH ranked_lowest AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY country ORDER BY salary ASC) AS rnk
  FROM employees
)
SELECT * FROM ranked_lowest WHERE rnk <= 2;

Output Explanation:

This query fetches the top two lowest-paid employees in salary per country:

id	name	department	salary	country	gender	rnk
17	Quinn	Sales	73000.00	Canada	Female	1
16	Paul	Marketing	75000.00	Canada	Male	2
4	David	Marketing	65000.00	UK	Male	1
11	Ken	Sales	72000.00	UK	Male	2
9	Ivy	Engineering	61000.00	USA	Female	1
6	Frank	Sales	61000.00	USA	Male	2

Paginate Results with Stable Ordering

For applications where data needs to be paginated, ROW_NUMBER() helps establish stable stateful pagination:

SELECT *
FROM (
  SELECT *, ROW_NUMBER() OVER (ORDER BY department, name) AS rn
  FROM employees
) AS numbered
WHERE rn BETWEEN 6 AND 10;

Output Explanation:

This query selects rows 6 through 10 based on alphabetical ordering by department and name:

id	name	department	salary	country	gender	rn
15	Oscar	Engineering	79000.00	USA	Male	6
3	Charlie	Marketing	67000.00	USA	Male	7
4	David	Marketing	65000.00	UK	Male	8
10	Jack	Marketing	68000.00	USA	Male	9
16	Paul	Marketing	75000.00	Canada	Male	10

To demonstrate pagination for specific demographics, such as filtering for female employees, the following query is used to arrange and paginate these records. It orders them alphabetically by department and name, showing rows 1 through 5:

SELECT *
FROM (
  SELECT *, ROW_NUMBER() OVER (ORDER BY department, name) AS rn
  FROM employees
  WHERE gender = 'Female'
) AS numbered
WHERE rn BETWEEN 1 AND 5;

Output Explanation:

This query filters and paginates female employees, organizing them in alphabetical order by department and name, then selecting rows 1 through 5:

id	name	department	salary	country	gender	rn
1	Alice	Engineering	85000.00	USA	Female	1
7	Grace	Engineering	78000.00	Canada	Female	2
9	Ivy	Engineering	61000.00	USA	Female	3
13	Mona	Product	93000.00	Canada	Female	4
14	Nina	Product	88000.00	UK	Female	5

Detect First Record in Each Group

ROW_NUMBER() proves invaluable for identifying the first entry in grouped records:

WITH first_in_group AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY department ORDER BY id) AS rnk
  FROM employees
)
SELECT * FROM first_in_group WHERE rnk = 1;

Output Explanation:

This query fetches the first entry for each department based on their id:

id	name	department	salary	country	gender	rnk
1	Alice	Engineering	85000.00	USA	Female	1
3	Charlie	Marketing	67000.00	USA	Male	1
13	Mona	Product	93000.00	Canada	Female	1
5	Eve	Sales	62000.00	USA	Female	1

Another detailed and useful application is to identify the highest-paid employee within each country, ensuring that the salary is considered in descending order to spot the top earner efficiently:

WITH top_earner AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY country ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT * FROM top_earner WHERE rnk = 1;

Output Explanation:

This query identifies the highest-paid employee in each country:

id	name	department	salary	country	gender	rnk
13	Mona	Product	93000.00	Canada	Female	1
2	Bob	Engineering	95000.00	UK	Male	1
18	Rita	Product	94000.00	USA	Female	1

Conclusion

By mastering the ROW_NUMBER() function, you gain powerful tools for managing data queries that involve ranking, pagination, deduplication, and more. Employing these techniques translates to more robust and efficient database management, allowing for sophisticated data manipulation and retrieval.

Exploring the MySQL ROW_NUMBER Window Function

Understanding Window Functions and ROW_NUMBER()

Setting Up the employees Table

Rank Employees by Salary Within Each Department

Get Top N (e.g., Top 2) Highest Paid Employees per Department

Get Top N (e.g., Top 2) Lowest Paid Employees per Country

Paginate Results with Stable Ordering

Detect First Record in Each Group

Conclusion

Exploring the MySQL `ROW_NUMBER` Window Function

Understanding Window Functions and `ROW_NUMBER()`

Setting Up the `employees` Table