SQL Advanced: Window Functions, CTEs & Analytics
Window functions and CTEs transform how you think about analytical SQL. Go beyond GROUP BY — compute rankings, running totals, moving averages, period comparisons, and hierarchical data without sacrificing row-level detail.
Window Functions — What and Why
Window functions perform calculations across a set of rows (a "window") while still returning one row per result row. This is the key difference from GROUP BY:
- GROUP BY collapses rows — you lose row-level detail
- Window functions keep every row and add computed values alongside
Real-world use cases:
- Rank products by revenue within each category
- Compare this month's sales to last month's for each store
- Calculate a running total of orders per customer
- Find which employees are above average within their department
- Identify the top-N items per group
-- GROUP BY collapses: 1 row per department, row-level detail is GONE
SELECT department_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY department_id;
-- Window function: every employee row is kept + department average added
SELECT emp_id, first_name, salary, department_id,
AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary
FROM employees;
-- Result: each employee row PLUS the average for their department alongside
OVER() Syntax — PARTITION BY, ORDER BY & Frames
Every window function uses the OVER() clause to define its window.
function_name(expression)
OVER (
PARTITION BY partition_column -- divide rows into groups
ORDER BY order_column -- define row sequence within each group
ROWS/RANGE BETWEEN frame_start -- define the rows included in each calculation
AND frame_end
)
Components of OVER():
| Column | Type | Description |
|---|---|---|
PARTITION BY | Optional | Divides rows into independent groups. Like GROUP BY but without collapsing rows. |
ORDER BY | Optional | Defines row sequence inside each partition. Required for ranking and running total functions. |
ROWS BETWEEN | Optional frame | Precisely defines the window frame for aggregate window functions. |
Frame Specification
-- Common frame patterns
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- running total (all rows from start)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW -- last 3 rows (moving average over 3)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- centered 3-row window
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- entire partition
Window functions are computed last in the SQL execution order — after all filtering. You cannot filter on a window function's result directly in a WHERE clause. Wrap the query in a subquery or CTE and filter on the outer query.
Ranking Functions — ROW_NUMBER, RANK, DENSE_RANK, NTILE
Ranking functions assign a position number to each row within a partition.
-- Compare all four ranking functions side by side
-- Using a sales table where two products have the same revenue
SELECT product_name,
revenue,
ROW_NUMBER() OVER (ORDER BY revenue DESC) AS row_num,
RANK() OVER (ORDER BY revenue DESC) AS rank,
DENSE_RANK() OVER (ORDER BY revenue DESC) AS dense_rank,
NTILE(4) OVER (ORDER BY revenue DESC) AS quartile
FROM product_sales;
-- Example output (with a tie at rank 2):
-- product_name | revenue | row_num | rank | dense_rank | quartile
-- Product A | 10000 | 1 | 1 | 1 | 1
-- Product B | 8000 | 2 | 2 | 2 | 1
-- Product C | 8000 | 3 | 2 | 2 | 2
-- Product D | 6000 | 4 | 4 | 3 | 2
-- Product E | 5000 | 5 | 5 | 4 | 3
| Column | Type | Description |
|---|---|---|
ROW_NUMBER() | Ranking | Assigns unique numbers — ties broken arbitrarily. No gaps. |
RANK() | Ranking | Tied rows get the same rank. Next rank skips (1, 2, 2, 4). |
DENSE_RANK() | Ranking | Tied rows get the same rank. No gaps in sequence (1, 2, 2, 3). |
NTILE(n) | Ranking | Divides rows into n equal buckets. Returns the bucket number (1 to n). |
Top-N Per Group — The Classic Pattern
-- Top 3 products by revenue in each category
SELECT *
FROM (
SELECT category,
product_name,
revenue,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY revenue DESC
) AS rn
FROM product_sales
) ranked
WHERE 3>= rn
ORDER BY category, rn;
Use ROW_NUMBER() when you want exactly N rows per group (deterministic). Use RANK() <= N when you want to include all tied entries at position N (you may get more than N rows if there are ties at the boundary).
Percentile — Where Does This Row Stand?
-- What percentile is each employee's salary in their department?
SELECT emp_id,
first_name,
salary,
department_id,
ROUND(
PERCENT_RANK() OVER (
PARTITION BY department_id
ORDER BY salary
) * 100, 1
) AS salary_percentile
FROM employees
ORDER BY department_id, salary;
LAG & LEAD — Comparing Rows Across Periods
LAG looks back at previous rows; LEAD looks ahead at next rows. Both are essential for period-over-period comparisons.
LAG — Access a Previous Row's Value
-- Monthly sales comparison: current month vs previous month
SELECT month_year,
total_sales,
LAG(total_sales) OVER (ORDER BY month_year) AS prev_month_sales,
total_sales - LAG(total_sales, 1, 0) OVER (ORDER BY month_year) AS mom_change,
ROUND(
(total_sales - LAG(total_sales) OVER (ORDER BY month_year))
/ NULLIF(LAG(total_sales) OVER (ORDER BY month_year), 0) * 100, 1
) AS mom_pct_change
FROM monthly_sales
ORDER BY month_year;
-- LAG(value, offset, default)
-- offset: how many rows back (default 1)
-- default: value to return when there is no previous row (default NULL)
LEAD — Access a Next Row's Value
-- Show each employee's next salary review date
SELECT emp_id,
first_name,
review_date,
LEAD(review_date) OVER (
PARTITION BY emp_id
ORDER BY review_date
) AS next_review_date
FROM performance_reviews
ORDER BY emp_id, review_date;
Store vs. Previous Store Performance
-- Compare each store's revenue to the prior-year same-month revenue
SELECT store_id,
month_year,
revenue,
LAG(revenue, 12) OVER (
PARTITION BY store_id
ORDER BY month_year
) AS same_month_prior_year,
ROUND(
(revenue - LAG(revenue, 12) OVER (PARTITION BY store_id ORDER BY month_year))
/ NULLIF(LAG(revenue, 12) OVER (PARTITION BY store_id ORDER BY month_year), 0) * 100, 1
) AS yoy_growth_pct
FROM store_monthly_sales
ORDER BY store_id, month_year;
Running Totals & Moving Averages
Running totals and moving averages are the most common analytical patterns in retail and finance reporting.
Running Total (Cumulative Sum)
-- Daily running total of sales
SELECT sale_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue
FROM daily_sales
ORDER BY sale_date;
-- Running total PER store (PARTITION BY resets the running total per store)
SELECT store_id,
sale_date,
daily_revenue,
SUM(daily_revenue) OVER (
PARTITION BY store_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS store_cumulative_revenue
FROM daily_sales
ORDER BY store_id, sale_date;
Moving Average
-- 7-day moving average of sales (current day + 6 preceding days)
SELECT sale_date,
daily_revenue,
ROUND(
AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2
) AS moving_avg_7day
FROM daily_sales
ORDER BY sale_date;
-- 30-day moving average for trend analysis
SELECT sale_date,
daily_revenue,
ROUND(AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
), 2) AS ma30
FROM daily_sales
ORDER BY sale_date;
Running Minimum and Maximum
-- Track the all-time high daily sales up to each date
SELECT sale_date,
daily_revenue,
MAX(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS all_time_high,
MIN(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS all_time_low
FROM daily_sales
ORDER BY sale_date;
RC:OMS uses window functions extensively for store performance dashboards, demand forecasting, and cross-channel revenue attribution. The patterns on this page run daily against 100M+ row datasets.
Explore RC:OMS →Common Table Expressions (CTE)
A CTE is a named temporary result set defined using the WITH clause. It exists only for the duration of a single query, but can be referenced multiple times within that query.
Why CTEs over subqueries:
- Dramatically more readable — complex queries become step-by-step narratives
- Can be referenced multiple times within the same query (unlike inline views)
- Make it easy to debug intermediate steps
- Required for recursive queries
Basic CTE
-- Without CTE (nested subqueries — hard to read)
SELECT department_name, avg_salary
FROM (
SELECT d.department_name, AVG(e.salary) AS avg_salary
FROM employees e JOIN departments d ON d.dept_id = e.department_id
WHERE e.status = 'A'
GROUP BY d.department_name
) dept_stats
WHERE avg_salary > 60000;
-- With CTE (clear and readable)
WITH dept_stats AS (
SELECT d.department_name,
AVG(e.salary) AS avg_salary
FROM employees e
JOIN departments d ON d.dept_id = e.department_id
WHERE e.status = 'A'
GROUP BY d.department_name
)
SELECT department_name, avg_salary
FROM dept_stats
WHERE avg_salary > 60000
ORDER BY avg_salary DESC;
Multiple CTEs — Chained Logic
-- Multi-step analytics: find top earners by department, then rank departments
WITH
dept_avgs AS (
SELECT department_id,
AVG(salary) AS dept_avg_salary,
COUNT(*) AS headcount
FROM employees
WHERE status = 'A'
GROUP BY department_id
),
dept_ranked AS (
SELECT da.department_id,
d.department_name,
da.dept_avg_salary,
da.headcount,
RANK() OVER (ORDER BY da.dept_avg_salary DESC) AS salary_rank
FROM dept_avgs da
JOIN departments d ON d.dept_id = da.department_id
)
SELECT department_name,
ROUND(dept_avg_salary, 2) AS avg_salary,
headcount,
salary_rank
FROM dept_ranked
WHERE 5>=salary_rank
ORDER BY salary_rank;
In most databases (Oracle, PostgreSQL), a CTE is not automatically materialized (cached). The database optimizer may inline the CTE or execute it multiple times if referenced more than once. If you need guaranteed materialization (run once, result cached), use Oracle's WITH /* + MATERIALIZE */ cte_name AS (...) hint, or a global temporary table.
Recursive CTE — Hierarchical Queries
Recursive CTEs execute the same query repeatedly, each time building on the results of the previous iteration. They are the standard way to traverse hierarchical or graph-structured data in SQL.
Structure of a Recursive CTE
WITH RECURSIVE cte_name AS (
-- Anchor member: the starting point (runs once)
SELECT ...
FROM table
WHERE starting_condition
UNION ALL
-- Recursive member: references the CTE itself (runs repeatedly)
SELECT ...
FROM table
JOIN cte_name ON join_condition -- links to previous iteration's results
)
SELECT * FROM cte_name;
-- Note: Oracle uses "WITH cte_name AS (...)" — no RECURSIVE keyword required
-- The RECURSIVE keyword is required in PostgreSQL and MySQL 8+
Employee-Manager Hierarchy
-- Traverse the full management tree from a given root
WITH emp_hierarchy AS (
-- Anchor: start with the CEO (manager_id IS NULL)
SELECT emp_id,
first_name,
manager_id,
0 AS depth,
first_name AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: join each employee to their manager from the previous level
SELECT e.emp_id,
e.first_name,
e.manager_id,
h.depth + 1,
h.path || ' > ' || e.first_name AS path
FROM employees e
JOIN emp_hierarchy h ON h.emp_id = e.manager_id
)
SELECT LPAD(' ', depth * 4) || first_name AS hierarchy_view,
depth,
path
FROM emp_hierarchy
ORDER BY path;
-- Result:
-- CEO
-- VP Engineering
-- Manager Backend
-- Developer 1
-- Developer 2
-- Manager Frontend
Generating a Number Sequence
-- Generate numbers 1 to 100 without a calendar or sequence table
WITH nums AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM nums
WHERE n < 100
)
SELECT n FROM nums;
Generating a Date Calendar
-- Generate every date in Q1 2025
WITH calendar AS (
SELECT DATE '2025-01-01' AS cal_date
UNION ALL
SELECT cal_date + 1
FROM calendar
WHERE cal_date < DATE '2025-03-31'
)
SELECT cal_date,
TO_CHAR(cal_date, 'Day') AS day_name,
EXTRACT(WEEK FROM cal_date) AS week_number
FROM calendar
ORDER BY cal_date;
A recursive CTE without a proper termination condition will recurse indefinitely and consume all available memory. In Oracle, you can use CYCLE detection. In PostgreSQL, set a maximum iteration count in the WHERE clause (WHERE depth < 20). Always test with small datasets first.
Common Gotchas
- !
Filtering on window function results in WHERE causes an error. Window functions execute after WHERE. You must wrap the query in a subquery or CTE:
WITH ranked AS (SELECT ..., ROW_NUMBER() OVER(...) AS rn FROM ...) SELECT * FROM ranked WHERE rn = 1. - !
RANK() gaps vs DENSE_RANK() no-gaps. When you use
RANK() <= 3, you may get more than 3 rows if there are ties at position 3 — and if position 2 has a tie, position 3 becomes position 4 (skip). Decide consciously whether you want RANK or DENSE_RANK for your use case. - !
Missing ORDER BY in window function gives non-deterministic results. Without ORDER BY inside OVER(), functions like ROW_NUMBER() return results in arbitrary order. For running totals and ranking, ORDER BY inside the OVER() clause is mandatory.
- !
LAG/LEAD returning NULL on the first/last row. The first row has no previous row, so LAG returns NULL by default. Use the optional third argument
LAG(value, 1, 0)to specify a default value instead of NULL. - !
Infinite recursion in recursive CTEs. Always include a termination condition in the recursive member's WHERE clause (e.g.,
WHERE depth < 50). Circular references in hierarchical data — where employee A reports to B who reports to A — will cause infinite recursion without cycle detection. - !
Confusing ROWS BETWEEN and RANGE BETWEEN.
ROWS BETWEEN 2 PRECEDING AND CURRENT ROWlooks at the previous 2 physical rows.RANGE BETWEEN 2 PRECEDING AND CURRENT ROWlooks at rows where the ORDER BY value is within 2 of the current row — very different when there are ties or gaps in values.
Key Takeaways
- ✓Window functions compute across related rows while preserving individual row detail — unlike GROUP BY which collapses rows. The OVER() clause defines the window with PARTITION BY, ORDER BY, and an optional frame.
- ✓ROW_NUMBER() assigns unique numbers (no ties). RANK() allows ties with gaps. DENSE_RANK() allows ties without gaps. NTILE(n) divides rows into n equal buckets.
- ✓LAG/LEAD access values from previous/next rows without a self-join — essential for period-over-period comparisons, churn detection, and sequence analysis.
- ✓Running totals use SUM() OVER (ORDER BY col ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). Moving averages use AVG() OVER (ORDER BY col ROWS BETWEEN n PRECEDING AND CURRENT ROW).
- ✓CTEs (WITH clause) make complex queries readable and maintainable. Multiple CTEs chain together like steps in a pipeline. CTEs can reference themselves recursively for hierarchical data traversal.
- ✓Recursive CTEs need an anchor query (starting point), a recursive query (references the CTE), UNION ALL between them, and a termination condition in the WHERE clause to prevent infinite loops.
