Advanced · SQL Mastery Series

SQL Advanced: Window Functions, CTEs & Analytics

Window functions and CTEs transform how you think about analytical SQL. Go beyond GROUP BY — compute rankings, running totals, moving averages, period comparisons, and hierarchical data without sacrificing row-level detail.

22 min read📅November 23, 2025✍️Priyanshu Pandey📚SQL Mastery Series

Window Functions — What and Why

Window functions perform calculations across a set of rows (a "window") while still returning one row per result row. This is the key difference from GROUP BY:

  • GROUP BY collapses rows — you lose row-level detail
  • Window functions keep every row and add computed values alongside

Real-world use cases:

  • Rank products by revenue within each category
  • Compare this month's sales to last month's for each store
  • Calculate a running total of orders per customer
  • Find which employees are above average within their department
  • Identify the top-N items per group
SQL

-- GROUP BY collapses: 1 row per department, row-level detail is GONE SELECT department_id, AVG(salary) AS dept_avg FROM employees GROUP BY department_id;

-- Window function: every employee row is kept + department average added SELECT emp_id, first_name, salary, department_id, AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary FROM employees; -- Result: each employee row PLUS the average for their department alongside

OVER() Syntax

OVER() Syntax — PARTITION BY, ORDER BY & Frames

Every window function uses the OVER() clause to define its window.

SQL

function_name(expression) OVER ( PARTITION BY partition_column -- divide rows into groups ORDER BY order_column -- define row sequence within each group ROWS/RANGE BETWEEN frame_start -- define the rows included in each calculation AND frame_end )

Components of OVER():

ColumnTypeDescription
PARTITION BY
OptionalDivides rows into independent groups. Like GROUP BY but without collapsing rows.
ORDER BY
OptionalDefines row sequence inside each partition. Required for ranking and running total functions.
ROWS BETWEEN
Optional framePrecisely defines the window frame for aggregate window functions.

Frame Specification

SQL

-- Common frame patterns ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- running total (all rows from start) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW -- last 3 rows (moving average over 3) ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- centered 3-row window ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- entire partition

💡
Window functions execute AFTER WHERE, GROUP BY, and HAVING

Window functions are computed last in the SQL execution order — after all filtering. You cannot filter on a window function's result directly in a WHERE clause. Wrap the query in a subquery or CTE and filter on the outer query.

Ranking Functions

Ranking Functions — ROW_NUMBER, RANK, DENSE_RANK, NTILE

Ranking functions assign a position number to each row within a partition.

SQL

-- Compare all four ranking functions side by side -- Using a sales table where two products have the same revenue SELECT product_name, revenue, ROW_NUMBER() OVER (ORDER BY revenue DESC) AS row_num, RANK() OVER (ORDER BY revenue DESC) AS rank, DENSE_RANK() OVER (ORDER BY revenue DESC) AS dense_rank, NTILE(4) OVER (ORDER BY revenue DESC) AS quartile FROM product_sales;

-- Example output (with a tie at rank 2): -- product_name | revenue | row_num | rank | dense_rank | quartile -- Product A | 10000 | 1 | 1 | 1 | 1 -- Product B | 8000 | 2 | 2 | 2 | 1 -- Product C | 8000 | 3 | 2 | 2 | 2 -- Product D | 6000 | 4 | 4 | 3 | 2 -- Product E | 5000 | 5 | 5 | 4 | 3

ColumnTypeDescription
ROW_NUMBER()
RankingAssigns unique numbers — ties broken arbitrarily. No gaps.
RANK()
RankingTied rows get the same rank. Next rank skips (1, 2, 2, 4).
DENSE_RANK()
RankingTied rows get the same rank. No gaps in sequence (1, 2, 2, 3).
NTILE(n)
RankingDivides rows into n equal buckets. Returns the bucket number (1 to n).

Top-N Per Group — The Classic Pattern

SQL

-- Top 3 products by revenue in each category SELECT * FROM ( SELECT category, product_name, revenue, ROW_NUMBER() OVER ( PARTITION BY category ORDER BY revenue DESC ) AS rn FROM product_sales ) ranked WHERE 3>= rn ORDER BY category, rn;

ROW_NUMBER vs RANK for top-N

Use ROW_NUMBER() when you want exactly N rows per group (deterministic). Use RANK() <= N when you want to include all tied entries at position N (you may get more than N rows if there are ties at the boundary).

Percentile — Where Does This Row Stand?

SQL

-- What percentile is each employee's salary in their department? SELECT emp_id, first_name, salary, department_id, ROUND( PERCENT_RANK() OVER ( PARTITION BY department_id ORDER BY salary ) * 100, 1 ) AS salary_percentile FROM employees ORDER BY department_id, salary;

LAG and LEAD

LAG & LEAD — Comparing Rows Across Periods

LAG looks back at previous rows; LEAD looks ahead at next rows. Both are essential for period-over-period comparisons.

LAG — Access a Previous Row's Value

SQL

-- Monthly sales comparison: current month vs previous month SELECT month_year, total_sales, LAG(total_sales) OVER (ORDER BY month_year) AS prev_month_sales, total_sales - LAG(total_sales, 1, 0) OVER (ORDER BY month_year) AS mom_change, ROUND( (total_sales - LAG(total_sales) OVER (ORDER BY month_year)) / NULLIF(LAG(total_sales) OVER (ORDER BY month_year), 0) * 100, 1 ) AS mom_pct_change FROM monthly_sales ORDER BY month_year;

-- LAG(value, offset, default) -- offset: how many rows back (default 1) -- default: value to return when there is no previous row (default NULL)

LEAD — Access a Next Row's Value

SQL

-- Show each employee's next salary review date SELECT emp_id, first_name, review_date, LEAD(review_date) OVER ( PARTITION BY emp_id ORDER BY review_date ) AS next_review_date FROM performance_reviews ORDER BY emp_id, review_date;

Store vs. Previous Store Performance

SQL

-- Compare each store's revenue to the prior-year same-month revenue SELECT store_id, month_year, revenue, LAG(revenue, 12) OVER ( PARTITION BY store_id ORDER BY month_year ) AS same_month_prior_year, ROUND( (revenue - LAG(revenue, 12) OVER (PARTITION BY store_id ORDER BY month_year)) / NULLIF(LAG(revenue, 12) OVER (PARTITION BY store_id ORDER BY month_year), 0) * 100, 1 ) AS yoy_growth_pct FROM store_monthly_sales ORDER BY store_id, month_year;

Running Totals

Running Totals & Moving Averages

Running totals and moving averages are the most common analytical patterns in retail and finance reporting.

Running Total (Cumulative Sum)

SQL

-- Daily running total of sales SELECT sale_date, daily_revenue, SUM(daily_revenue) OVER ( ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_revenue FROM daily_sales ORDER BY sale_date;

-- Running total PER store (PARTITION BY resets the running total per store) SELECT store_id, sale_date, daily_revenue, SUM(daily_revenue) OVER ( PARTITION BY store_id ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS store_cumulative_revenue FROM daily_sales ORDER BY store_id, sale_date;

Moving Average

SQL

-- 7-day moving average of sales (current day + 6 preceding days) SELECT sale_date, daily_revenue, ROUND( AVG(daily_revenue) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ), 2 ) AS moving_avg_7day FROM daily_sales ORDER BY sale_date;

-- 30-day moving average for trend analysis SELECT sale_date, daily_revenue, ROUND(AVG(daily_revenue) OVER ( ORDER BY sale_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW ), 2) AS ma30 FROM daily_sales ORDER BY sale_date;

Running Minimum and Maximum

SQL

-- Track the all-time high daily sales up to each date SELECT sale_date, daily_revenue, MAX(daily_revenue) OVER ( ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS all_time_high, MIN(daily_revenue) OVER ( ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS all_time_low FROM daily_sales ORDER BY sale_date;

RC:OMS · Multi-Channel Analytics
Window functions in production retail analytics

RC:OMS uses window functions extensively for store performance dashboards, demand forecasting, and cross-channel revenue attribution. The patterns on this page run daily against 100M+ row datasets.

Explore RC:OMS
CTEs

Common Table Expressions (CTE)

A CTE is a named temporary result set defined using the WITH clause. It exists only for the duration of a single query, but can be referenced multiple times within that query.

Why CTEs over subqueries:

  • Dramatically more readable — complex queries become step-by-step narratives
  • Can be referenced multiple times within the same query (unlike inline views)
  • Make it easy to debug intermediate steps
  • Required for recursive queries

Basic CTE

SQL

-- Without CTE (nested subqueries — hard to read) SELECT department_name, avg_salary FROM ( SELECT d.department_name, AVG(e.salary) AS avg_salary FROM employees e JOIN departments d ON d.dept_id = e.department_id WHERE e.status = 'A' GROUP BY d.department_name ) dept_stats WHERE avg_salary > 60000;

-- With CTE (clear and readable) WITH dept_stats AS ( SELECT d.department_name, AVG(e.salary) AS avg_salary FROM employees e JOIN departments d ON d.dept_id = e.department_id WHERE e.status = 'A' GROUP BY d.department_name ) SELECT department_name, avg_salary FROM dept_stats WHERE avg_salary > 60000 ORDER BY avg_salary DESC;

Multiple CTEs — Chained Logic

SQL

-- Multi-step analytics: find top earners by department, then rank departments WITH dept_avgs AS ( SELECT department_id, AVG(salary) AS dept_avg_salary, COUNT(*) AS headcount FROM employees WHERE status = 'A' GROUP BY department_id ), dept_ranked AS ( SELECT da.department_id, d.department_name, da.dept_avg_salary, da.headcount, RANK() OVER (ORDER BY da.dept_avg_salary DESC) AS salary_rank FROM dept_avgs da JOIN departments d ON d.dept_id = da.department_id ) SELECT department_name, ROUND(dept_avg_salary, 2) AS avg_salary, headcount, salary_rank FROM dept_ranked WHERE 5>=salary_rank ORDER BY salary_rank;

💡
CTEs are not materialized by default

In most databases (Oracle, PostgreSQL), a CTE is not automatically materialized (cached). The database optimizer may inline the CTE or execute it multiple times if referenced more than once. If you need guaranteed materialization (run once, result cached), use Oracle's WITH /* + MATERIALIZE */ cte_name AS (...) hint, or a global temporary table.

Recursive CTE

Recursive CTE — Hierarchical Queries

Recursive CTEs execute the same query repeatedly, each time building on the results of the previous iteration. They are the standard way to traverse hierarchical or graph-structured data in SQL.

Structure of a Recursive CTE

SQL

WITH RECURSIVE cte_name AS ( -- Anchor member: the starting point (runs once) SELECT ... FROM table WHERE starting_condition

UNION ALL

-- Recursive member: references the CTE itself (runs repeatedly) SELECT ... FROM table JOIN cte_name ON join_condition -- links to previous iteration's results ) SELECT * FROM cte_name;

-- Note: Oracle uses "WITH cte_name AS (...)" — no RECURSIVE keyword required -- The RECURSIVE keyword is required in PostgreSQL and MySQL 8+

Employee-Manager Hierarchy

SQL

-- Traverse the full management tree from a given root WITH emp_hierarchy AS ( -- Anchor: start with the CEO (manager_id IS NULL) SELECT emp_id, first_name, manager_id, 0 AS depth, first_name AS path FROM employees WHERE manager_id IS NULL

UNION ALL

-- Recursive: join each employee to their manager from the previous level SELECT e.emp_id, e.first_name, e.manager_id, h.depth + 1, h.path || ' > ' || e.first_name AS path FROM employees e JOIN emp_hierarchy h ON h.emp_id = e.manager_id ) SELECT LPAD(' ', depth * 4) || first_name AS hierarchy_view, depth, path FROM emp_hierarchy ORDER BY path;

-- Result: -- CEO -- VP Engineering -- Manager Backend -- Developer 1 -- Developer 2 -- Manager Frontend

Generating a Number Sequence

SQL

-- Generate numbers 1 to 100 without a calendar or sequence table WITH nums AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM nums WHERE n < 100 ) SELECT n FROM nums;

Generating a Date Calendar

SQL

-- Generate every date in Q1 2025 WITH calendar AS ( SELECT DATE '2025-01-01' AS cal_date UNION ALL SELECT cal_date + 1 FROM calendar WHERE cal_date < DATE '2025-03-31' ) SELECT cal_date, TO_CHAR(cal_date, 'Day') AS day_name, EXTRACT(WEEK FROM cal_date) AS week_number FROM calendar ORDER BY cal_date;

⚠️
Always set a recursion depth limit

A recursive CTE without a proper termination condition will recurse indefinitely and consume all available memory. In Oracle, you can use CYCLE detection. In PostgreSQL, set a maximum iteration count in the WHERE clause (WHERE depth < 20). Always test with small datasets first.

Gotchas

Common Gotchas

  • !

    Filtering on window function results in WHERE causes an error. Window functions execute after WHERE. You must wrap the query in a subquery or CTE: WITH ranked AS (SELECT ..., ROW_NUMBER() OVER(...) AS rn FROM ...) SELECT * FROM ranked WHERE rn = 1.

  • !

    RANK() gaps vs DENSE_RANK() no-gaps. When you use RANK() <= 3, you may get more than 3 rows if there are ties at position 3 — and if position 2 has a tie, position 3 becomes position 4 (skip). Decide consciously whether you want RANK or DENSE_RANK for your use case.

  • !

    Missing ORDER BY in window function gives non-deterministic results. Without ORDER BY inside OVER(), functions like ROW_NUMBER() return results in arbitrary order. For running totals and ranking, ORDER BY inside the OVER() clause is mandatory.

  • !

    LAG/LEAD returning NULL on the first/last row. The first row has no previous row, so LAG returns NULL by default. Use the optional third argument LAG(value, 1, 0) to specify a default value instead of NULL.

  • !

    Infinite recursion in recursive CTEs. Always include a termination condition in the recursive member's WHERE clause (e.g., WHERE depth < 50). Circular references in hierarchical data — where employee A reports to B who reports to A — will cause infinite recursion without cycle detection.

  • !

    Confusing ROWS BETWEEN and RANGE BETWEEN. ROWS BETWEEN 2 PRECEDING AND CURRENT ROW looks at the previous 2 physical rows. RANGE BETWEEN 2 PRECEDING AND CURRENT ROW looks at rows where the ORDER BY value is within 2 of the current row — very different when there are ties or gaps in values.

Key Takeaways

Key Takeaways
  • Window functions compute across related rows while preserving individual row detail — unlike GROUP BY which collapses rows. The OVER() clause defines the window with PARTITION BY, ORDER BY, and an optional frame.
  • ROW_NUMBER() assigns unique numbers (no ties). RANK() allows ties with gaps. DENSE_RANK() allows ties without gaps. NTILE(n) divides rows into n equal buckets.
  • LAG/LEAD access values from previous/next rows without a self-join — essential for period-over-period comparisons, churn detection, and sequence analysis.
  • Running totals use SUM() OVER (ORDER BY col ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). Moving averages use AVG() OVER (ORDER BY col ROWS BETWEEN n PRECEDING AND CURRENT ROW).
  • CTEs (WITH clause) make complex queries readable and maintainable. Multiple CTEs chain together like steps in a pipeline. CTEs can reference themselves recursively for hierarchical data traversal.
  • Recursive CTEs need an anchor query (starting point), a recursive query (references the CTE), UNION ALL between them, and a termination condition in the WHERE clause to prevent infinite loops.
RetailCoder
All systems operational
v1.0 Live

RC:OMS

Multi-channel order management with double-entry inventory ledger. Amazon, Flipkart, Shopify, WooCommerce — one source of truth.

Launch demo →
v1.0 Live

RC:Storefront

Self-hosted headless e-commerce. Your server, your data, zero transaction fees. Native RC:OMS inventory sync.

Visit Storefront →
Pipeline

RC:Pulse

AI-powered retail analytics and demand forecasting — built natively on top of your RC:OMS and Storefront data.

Request early access →
Built in India 🇮🇳  ·  Architected by Priyanshu PandeyTalk to an engineer →