SQL Intermediate: Joins, Aggregates & Beyond
You know SELECT and WHERE. Now it's time to combine tables with JOINs, summarize data with aggregate functions, write subqueries, use set operators, control flow with CASE, and supercharge performance with indexes.
SQL Joins — Combining Tables
SQL Joins combine rows from two or more tables based on a related column. Joins are the primary way relational databases connect data spread across normalized tables.
The sample data used throughout this section:
-- Employees table (e)
-- emp_id | first_name | department_id | salary
-- 101 | Priya | 10 | 75000
-- 102 | Rahul | 20 | 80000
-- 103 | Anita | NULL | 65000 ← no dept assigned
-- Departments table (d)
-- dept_id | department_name
-- 10 | IT
-- 20 | Finance
-- 30 | HR ← no employees in HR
INNER JOIN — Only Matching Rows
Returns rows that have a matching value in both tables.
SELECT e.first_name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.dept_id;
-- Result: Priya/IT, Rahul/Finance
-- Anita and HR are both excluded (no match)
LEFT JOIN — All Left Rows, Matched Right Rows
Returns all rows from the left table and matching rows from the right. Unmatched right rows appear as NULL.
SELECT e.first_name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.dept_id;
-- Result: Priya/IT, Rahul/Finance, Anita/NULL
-- Anita is included with NULL department (she has no dept assigned)
RIGHT JOIN — All Right Rows, Matched Left Rows
Returns all rows from the right table and matching rows from the left.
SELECT e.first_name, d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.dept_id;
-- Result: Priya/IT, Rahul/Finance, NULL/HR
-- HR is included with NULL employee (no one assigned to HR)
FULL OUTER JOIN — All Rows from Both Tables
Returns all rows when there is a match in either table. Unmatched rows on either side get NULL.
SELECT e.first_name, d.department_name
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.dept_id;
-- Result: Priya/IT, Rahul/Finance, Anita/NULL, NULL/HR
-- No data is lost from either table
CROSS JOIN — Cartesian Product
Returns every possible combination of rows from both tables. Use with care on large tables — result size = rows_left × rows_right.
SELECT e.first_name, d.department_name
FROM employees e
CROSS JOIN departments d;
-- If employees has 3 rows and departments has 3 rows → 9 result rows
-- Useful for generating combinations, test data, or schedule matrices
SELF JOIN — Table Joined to Itself
A table joined to itself — the classic use case is an employee-manager hierarchy where both employees and managers are in the same table.
-- Find each employee and their manager's name
SELECT e.first_name AS employee,
m.first_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id
ORDER BY m.first_name, e.first_name;
Multi-Table Join
-- Join three tables: orders + customers + products
SELECT c.customer_name,
p.product_name,
o.quantity,
o.order_date
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN products p ON p.product_id = o.product_id
WHERE o.order_date >= DATE '2025-01-01'
ORDER BY o.order_date DESC;
Table aliases (e, d, c) make multi-table queries dramatically more readable. Use short, meaningful aliases. Qualify every column with its alias when more than one table is in the FROM clause to avoid ambiguity and improve query clarity.
Aggregate Functions — COUNT, SUM, AVG, MIN, MAX
Aggregate functions compute a single result from a set of rows. They are the foundation of reporting and analytics queries.
-- All five core aggregates in one query
SELECT
COUNT(*) AS total_employees,
COUNT(commission) AS employees_with_commission, -- NULLs excluded
SUM(salary) AS total_payroll,
AVG(salary) AS average_salary,
MIN(salary) AS lowest_salary,
MAX(salary) AS highest_salary
FROM employees
WHERE status = 'A';
| Column | Type | Description |
|---|---|---|
COUNT(*) | Number of rows | Counts all rows including NULLs |
COUNT(col) | Non-NULL rows | Excludes rows where col is NULL |
COUNT(DISTINCT col) | Unique non-NULL values | Excludes NULLs and duplicates |
SUM(col) | Total of values | Ignores NULLs — NULLs are treated as 0 contribution |
AVG(col) | Average of values | Ignores NULLs — denominator is count of non-NULL rows |
MIN(col) | Smallest value | Ignores NULLs |
MAX(col) | Largest value | Ignores NULLs |
If 10 employees have salary data and 2 have NULL, AVG(salary) divides by 10 (not 12). If NULLs mean "zero salary" in your context, replace them first: AVG(NVL(salary, 0)).
GROUP BY — Grouping & Summarizing
GROUP BY groups rows with the same values in specified columns, then aggregate functions compute a result per group.
-- Headcount and average salary per department
SELECT department_id,
COUNT(*) AS headcount,
ROUND(AVG(salary), 2) AS avg_salary,
SUM(salary) AS total_salary
FROM employees
WHERE status = 'A'
GROUP BY department_id
ORDER BY avg_salary DESC;
-- Group by multiple columns
SELECT department_id,
job_title,
COUNT(*) AS headcount,
MAX(salary) AS max_salary
FROM employees
GROUP BY department_id, job_title
ORDER BY department_id, headcount DESC;
This is the most common GROUP BY mistake. If your SELECT contains first_name, department, COUNT(*), then first_name and department must both be in GROUP BY (or wrapped in an aggregate). Most databases enforce this strictly.
HAVING — Filtering Groups
HAVING filters groups after aggregation, just as WHERE filters rows before aggregation.
-- Departments with more than 5 employees and average salary above 60k
SELECT department_id,
COUNT() AS headcount,
ROUND(AVG(salary),2) AS avg_salary
FROM employees
WHERE status = 'A' -- filter rows BEFORE grouping
GROUP BY department_id
HAVING COUNT() > 5 -- filter groups AFTER aggregation
AND AVG(salary) > 60000
ORDER BY avg_salary DESC;
| Column | Type | Description |
|---|---|---|
WHERE | Before GROUP BY | Filter individual rows — can reference any column |
HAVING | After GROUP BY | Filter groups — must reference aggregate expressions or GROUP BY columns |
Subqueries — Queries Inside Queries
A subquery is a SELECT statement nested inside another SQL statement. Subqueries can appear in WHERE, FROM, SELECT, or HAVING clauses.
Single-Row Subquery
Returns exactly one value — used with =, >, .
-- Employees earning above the company average
SELECT first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC;
Multi-Row Subquery
Returns multiple rows — used with IN, ANY, ALL.
-- Employees in departments located in 'North' region
SELECT first_name, department_id
FROM employees
WHERE department_id IN (
SELECT dept_id FROM departments WHERE region = 'North'
);
-- Employees earning more than ALL employees in HR
SELECT first_name, salary
FROM employees
WHERE salary > ALL (
SELECT salary FROM employees WHERE department_id = (
SELECT dept_id FROM departments WHERE department_name = 'HR'
)
);
Correlated Subquery
Executes once for each row in the outer query — the subquery references a column from the outer query.
-- Employees earning above the average for their own department
SELECT e.first_name,
e.salary,
e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e.department_id -- correlation: matches outer row's dept
)
ORDER BY e.department_id, e.salary DESC;
Inline View (Subquery in FROM)
A subquery in the FROM clause acts like a temporary table.
-- Top 5 departments by average salary
SELECT dept_stats.department_name,
dept_stats.avg_salary
FROM (
SELECT d.department_name,
ROUND(AVG(e.salary), 2) AS avg_salary
FROM employees e
JOIN departments d ON d.dept_id = e.department_id
GROUP BY d.department_name
) dept_stats
ORDER BY dept_stats.avg_salary DESC
FETCH FIRST 5 ROWS ONLY;
A correlated subquery runs once per outer row. On a table with 100,000 rows, that is 100,000 subquery executions. For performance, consider rewriting as a JOIN or using window functions (covered in the Advanced guide).
UNION, INTERSECT & EXCEPT
Set operators combine the result sets of two SELECT queries. Both queries must return the same number of columns in the same order with compatible data types.
UNION vs UNION ALL
-- UNION — combines results and removes duplicates
SELECT first_name, department FROM employees_us
UNION
SELECT first_name, department FROM employees_uk;
-- UNION ALL — keeps all rows including duplicates (faster — no deduplication)
SELECT product_id FROM orders_2024
UNION ALL
SELECT product_id FROM orders_2025;
-- Count how often each product appears across both years
SELECT product_id, COUNT(*) AS total_orders
FROM (
SELECT product_id FROM orders_2024
UNION ALL
SELECT product_id FROM orders_2025
) combined
GROUP BY product_id
ORDER BY total_orders DESC;
INTERSECT — Common Rows
-- Customers who placed orders in BOTH 2024 AND 2025
SELECT customer_id FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2024
INTERSECT
SELECT customer_id FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2025;
EXCEPT / MINUS — Rows in First But Not Second
-- Products in the catalog but never ordered
SELECT product_id FROM products
EXCEPT -- PostgreSQL / SQL Server
SELECT product_id FROM orders;
-- Oracle uses MINUS instead of EXCEPT
SELECT product_id FROM products
MINUS
SELECT product_id FROM orders;
| Column | Type | Description |
|---|---|---|
UNION | Set operator | All rows from both queries. Yes. |
UNION ALL | Set operator | All rows from both queries. No — faster. |
INTERSECT | Set operator | Only rows present in both queries. Yes. |
EXCEPT / MINUS | Set operator | Rows in first query not in second. Yes. |
CASE — Conditional Logic in SQL
CASE is SQL's if-then-else. It evaluates conditions and returns different values based on which condition is true.
Simple CASE
-- Translate a code to a readable label
SELECT first_name,
status,
CASE status
WHEN 'A' THEN 'Active'
WHEN 'I' THEN 'Inactive'
WHEN 'T' THEN 'Terminated'
ELSE 'Unknown'
END AS status_label
FROM employees;
Searched CASE
-- Classify employees by salary band
SELECT first_name,
salary,
CASE
WHEN salary < 40000 THEN 'Entry Level'
WHEN salary BETWEEN 40000 AND 70000 THEN 'Mid Level'
WHEN salary BETWEEN 70001 AND 100000 THEN 'Senior Level'
ELSE 'Executive'
END AS salary_band
FROM employees
ORDER BY salary;
CASE in Aggregate Functions
-- Pivot-style: count employees by band in a single row
SELECT
COUNT(CASE WHEN salary < 40000 THEN 1 END) AS entry_level,
COUNT(CASE WHEN salary BETWEEN 40000 AND 70000 THEN 1 END) AS mid_level,
COUNT(CASE WHEN salary > 70000 THEN 1 END) AS senior_level
FROM employees
WHERE status = 'A';
COALESCE, NULLIF & NVL
These functions provide clean ways to handle NULL values without cluttering your queries with CASE expressions.
COALESCE — First Non-NULL Value
-- Return the first non-NULL contact method available
SELECT first_name,
COALESCE(mobile_phone, office_phone, personal_email, 'No contact') AS contact
FROM employees;
-- Replace NULL salary with 0 in calculations
SELECT first_name,
COALESCE(salary, 0) + COALESCE(commission, 0) AS total_comp
FROM employees;
NVL — Oracle's Two-Argument NULL Replace
-- Oracle: NVL(value, replacement_if_null)
SELECT first_name,
NVL(commission, 0) AS commission,
salary + NVL(commission, 0) AS total_comp
FROM employees;
-- NVL2(value, if_not_null, if_null) — Oracle only
SELECT first_name,
NVL2(commission, 'Has Commission', 'No Commission') AS commission_status
FROM employees;
NULLIF — Return NULL if Two Values Are Equal
-- Avoid division by zero: NULLIF(denominator, 0) returns NULL instead of error
SELECT product_name,
revenue,
units_sold,
revenue / NULLIF(units_sold, 0) AS revenue_per_unit
FROM sales;
-- Use NULLIF to suppress a default value
SELECT first_name,
NULLIF(notes, 'N/A') AS notes -- treats 'N/A' like NULL
FROM employees;
String Functions — CONCAT, LENGTH, SUBSTRING, TRIM
-- CONCAT (standard) / || (Oracle)
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
SELECT first_name || ' ' || last_name AS full_name FROM employees; -- Oracle
-- LENGTH / LEN — number of characters
SELECT product_code, LENGTH(product_code) AS code_length FROM products;
-- SUBSTR (Oracle) / SUBSTRING (MySQL, PostgreSQL)
-- SUBSTR(string, start_position, length)
SELECT SUBSTR('RetailCoder', 1, 6) FROM dual; -- 'Retail'
SELECT SUBSTR('RetailCoder', 8) FROM dual; -- 'Coder' (to end)
SELECT SUBSTRING('RetailCoder', 1, 6) FROM dual; -- MySQL/PostgreSQL
-- TRIM — remove leading and/or trailing characters
SELECT TRIM(' hello world ') FROM dual; -- 'hello world'
SELECT LTRIM(' hello') FROM dual; -- 'hello'
SELECT RTRIM('world ') FROM dual; -- 'world'
SELECT TRIM('x' FROM 'xxxhelloxx') FROM dual; -- 'hello'
-- UPPER / LOWER / INITCAP
SELECT UPPER('oracle sql') FROM dual; -- 'ORACLE SQL'
SELECT LOWER('ORACLE SQL') FROM dual; -- 'oracle sql'
SELECT INITCAP('oracle sql') FROM dual; -- 'Oracle Sql'
-- REPLACE
SELECT REPLACE('Oracle RMS 16.0', '16.0', '19.0') FROM dual;
-- INSTR — find position of a substring
SELECT INSTR('retailcoder@example.com', '@') AS at_position FROM dual; -- 12
-- LPAD / RPAD — pad to fixed width
SELECT LPAD(emp_id, 8, '0') FROM employees; -- '00000101'
Date Functions
-- Current date and time
SELECT SYSDATE FROM dual; -- Oracle: date + time
SELECT CURRENT_DATE FROM dual; -- Standard SQL
SELECT SYSTIMESTAMP FROM dual; -- Oracle: timestamp with timezone
-- TRUNC — remove time from date
SELECT TRUNC(SYSDATE) FROM dual; -- today at 00:00:00
SELECT TRUNC(SYSDATE, 'MM') FROM dual; -- first day of current month
SELECT TRUNC(SYSDATE, 'YYYY') FROM dual; -- first day of current year
-- TO_CHAR — format date as string (Oracle)
SELECT TO_CHAR(SYSDATE, 'DD-Mon-YYYY HH24:MI:SS') FROM dual;
SELECT TO_CHAR(hire_date, 'Day, DD Month YYYY') FROM employees;
-- TO_DATE — parse string to date (Oracle)
SELECT TO_DATE('15-Apr-2025', 'DD-Mon-YYYY') FROM dual;
SELECT TO_DATE('2025-04-15', 'YYYY-MM-DD') FROM dual;
-- ADD_MONTHS (Oracle) — add n months to a date
SELECT ADD_MONTHS(hire_date, 6) AS probation_end FROM employees;
-- MONTHS_BETWEEN (Oracle) — months between two dates
SELECT ROUND(MONTHS_BETWEEN(SYSDATE, hire_date)) AS tenure_months
FROM employees;
-- Date arithmetic (Oracle)
SELECT SYSDATE + 30 AS thirty_days_from_now FROM dual;
SELECT SYSDATE - 7 AS one_week_ago FROM dual;
-- EXTRACT — pull a component from a date
SELECT EXTRACT(YEAR FROM hire_date) AS hire_year,
EXTRACT(MONTH FROM hire_date) AS hire_month
FROM employees;
-- MySQL equivalents
SELECT DATE_ADD(NOW(), INTERVAL 30 DAY) AS thirty_days;
SELECT DATEDIFF('2025-12-31', '2025-01-01') AS days_apart;
SELECT DATE_FORMAT(NOW(), '%d-%m-%Y') AS formatted;
Mathematical Functions
-- ROUND — round to n decimal places
SELECT ROUND(3.14159, 2) FROM dual; -- 3.14
SELECT ROUND(3.145, 2) FROM dual; -- 3.15
SELECT ROUND(salary / 22, 2) AS daily_rate FROM employees;
-- TRUNC (Oracle) / TRUNCATE (MySQL) — truncate without rounding
SELECT TRUNC(3.99, 0) FROM dual; -- 3 (Oracle)
SELECT TRUNCATE(3.99, 0) FROM dual; -- 3 (MySQL)
-- CEIL / CEILING — round UP to nearest integer
SELECT CEIL(4.1) FROM dual; -- 5
SELECT CEIL(-4.1) FROM dual; -- -4
-- FLOOR — round DOWN to nearest integer
SELECT FLOOR(4.9) FROM dual; -- 4
SELECT FLOOR(-4.9) FROM dual; -- -5
-- ABS — absolute value
SELECT ABS(-5000) FROM dual; -- 5000
-- MOD — modulo (remainder after division)
SELECT MOD(17, 5) FROM dual; -- 2
-- Practical use: identify even/odd rows
SELECT emp_id FROM employees WHERE MOD(emp_id, 2) = 0;
-- POWER / SQRT
SELECT POWER(2, 8) FROM dual; -- 256
SELECT SQRT(144) FROM dual; -- 12
-- SIGN — returns -1, 0, or 1 based on sign
SELECT SIGN(-500), SIGN(0), SIGN(300) FROM dual; -- -1, 0, 1
SQL Views
A view is a stored SQL query given a name. It behaves like a virtual table — you query a view exactly like a real table, but the database runs the underlying SELECT at query time.
Why use views:
- Hide complexity from end users and report writers
- Enforce row-level or column-level security (expose only permitted data)
- Create a stable interface when the underlying tables may change
- Avoid duplicating complex JOIN logic across many queries
CREATE VIEW
-- Create a view for active employees with department names
CREATE OR REPLACE VIEW vw_active_employees AS
SELECT e.emp_id,
e.first_name || ' ' || e.last_name AS full_name,
d.department_name,
e.salary,
e.hire_date
FROM employees e
JOIN departments d ON d.dept_id = e.department_id
WHERE e.status = 'A';
-- Use the view exactly like a table
SELECT * FROM vw_active_employees WHERE department_name = 'Finance';
SELECT department_name, AVG(salary) FROM vw_active_employees GROUP BY department_name;
CREATE OR REPLACE VIEW
-- Modify an existing view (Oracle / PostgreSQL)
CREATE OR REPLACE VIEW vw_active_employees AS
SELECT e.emp_id,
e.first_name || ' ' || e.last_name AS full_name,
d.department_name,
e.salary,
e.hire_date,
e.email -- added column
FROM employees e
JOIN departments d ON d.dept_id = e.department_id
WHERE e.status = 'A';
DROP VIEW
DROP VIEW vw_active_employees;
DROP VIEW IF EXISTS vw_active_employees; -- MySQL / PostgreSQL
Every time you SELECT from a view, the database executes the underlying query fresh. This means views always reflect the current state of the data, but they don't provide a performance benefit from caching. Use Materialized Views (Oracle) or indexed views (SQL Server) when you need pre-computed, cached results.
SQL Indexes
An index is a data structure that allows the database engine to find rows much faster — similar to an index at the back of a book. Without an index, the database must scan every row in a table (a full table scan) for each query.
CREATE INDEX
-- Single-column index on a frequently filtered column
CREATE INDEX idx_employees_dept
ON employees (department_id);
-- Composite index for queries that filter by both columns together
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date);
-- Unique index — enforces uniqueness AND improves lookup speed
CREATE UNIQUE INDEX idx_employees_email
ON employees (email);
-- Function-based index (Oracle) — index on an expression
CREATE INDEX idx_upper_lastname
ON employees (UPPER(last_name));
-- Now this query can use the index
SELECT * FROM employees WHERE UPPER(last_name) = 'SHARMA';
DROP INDEX
-- Oracle
DROP INDEX idx_employees_dept;
-- MySQL
DROP INDEX idx_employees_dept ON employees;
-- PostgreSQL
DROP INDEX IF EXISTS idx_employees_dept;
Index Types
| Column | Type | Description |
|---|---|---|
B-Tree (default) | General purpose | Equality and range queries on almost any column. The default for CREATE INDEX. |
Unique Index | Uniqueness + speed | Columns that must be unique (email, SSN). Automatically created for PRIMARY KEY and UNIQUE constraints. |
Composite Index | Multi-column filters | Queries that filter on multiple columns together. Column order matters — leftmost columns are used first. |
Function-Based (Oracle) | Indexed expressions | Queries that filter on UPPER(col), LOWER(col), or other expressions. |
Bitmap (Oracle) | Low-cardinality columns | Columns with few distinct values (status, gender) in data warehouse / analytics contexts. |
Every INSERT, UPDATE, or DELETE on an indexed table must also update the index. More indexes = faster SELECT, slower DML. Index columns that appear in WHERE, JOIN ON, and ORDER BY clauses of your most critical queries. Don't index every column.
Common Gotchas
- !
JOIN without an ON condition creates a CROSS JOIN. In some SQL dialects, writing
FROM employees, departmentsor forgetting the ON clause silently produces a Cartesian product. Always explicitly write your JOIN type and ON condition. - !
Aggregate functions in WHERE cause an error. You cannot use
COUNT(*),SUM(), or other aggregate functions in a WHERE clause. Use HAVING to filter after grouping. The error message is often cryptic: "ORA-00934: group function is not allowed here." - !
UNION removes duplicates silently — use UNION ALL if you want all rows. If you're combining logs, event streams, or any data where duplicates are meaningful, use UNION ALL. UNION deduplication has a performance cost and can silently drop valid rows.
- !
Correlated subqueries are not always the right tool. Correlated subqueries execute once per outer row and can be extremely slow on large datasets. Most correlated subqueries can be rewritten as a JOIN or a window function, which the query optimizer can plan more efficiently.
- !
Function on an indexed column defeats the index.
WHERE UPPER(last_name) = 'SHARMA'cannot use a regular index onlast_name. Either create a function-based index onUPPER(last_name), or store data consistently (e.g., always in uppercase) to avoid the function call. - !
Too many indexes hurt write performance. Indexes are not free. Each index must be maintained on every INSERT/UPDATE/DELETE. A table with 15 indexes can be dramatically slower for bulk loads than one with 3 well-chosen indexes.
Key Takeaways
- ✓INNER JOIN returns only matched rows. LEFT JOIN keeps all left rows. FULL OUTER JOIN keeps all rows from both sides. Choose based on whether you need to preserve unmatched rows.
- ✓Aggregate functions (COUNT, SUM, AVG, MIN, MAX) ignore NULLs except COUNT(*). WHERE filters before aggregation; HAVING filters after aggregation.
- ✓Subqueries can go in WHERE (filter), FROM (inline view), SELECT (scalar), or HAVING. Correlated subqueries run once per outer row — use JOINs or window functions for better performance at scale.
- ✓UNION removes duplicates; UNION ALL keeps them all (and is faster). Use INTERSECT for common rows, and EXCEPT/MINUS for rows in one set but not another.
- ✓CASE is SQL's conditional logic. Use it in SELECT columns, in WHERE with complex conditions, or inside aggregate functions for pivot-style summaries.
- ✓COALESCE returns the first non-NULL value in a list. NVL is Oracle's two-argument version. NULLIF returns NULL when two values are equal — perfect for avoiding divide-by-zero errors.
- ✓Views are stored queries, not stored data. They simplify complex queries, enforce security, and create stable interfaces. Use Materialized Views when you need cached, precomputed results.
- ✓Indexes speed up reads but add overhead to writes. Index columns in WHERE, JOIN ON, and ORDER BY clauses of your critical queries. Avoid over-indexing.
