Lecture 01 · Fundamentals

Introduction to SQL & Setup

0️⃣ What is SQL?

SQL (Structured Query Language) is the ANSI‑standard language used to create, read, update, and delete data in relational database management systems (RDBMS). It abstracts the underlying storage engine and lets you work with tables (relations) that are linked via keys.

Brief History & Why It Still Matters

1970 – Edgar F. Codd proposes the relational model (IBM System R).
1974 – IBM releases SQL/2, the first commercial implementation.
1992 – ANSI/ISO adopts SQL‑92 as a formal standard (still the baseline).
2000‑2024 – Continual extensions (SQL:1999, SQL:2003, …) add XML/JSON handling, window functions, and more.

RDBMS	License	Typical Use‑Case	Latest Stable Version (2024)
PostgreSQL	Open‑source (PostgreSQL License)	Enterprise‑grade, geo‑spatial, analytics	16.2
MySQL / MariaDB	Dual (GPL / Commercial)	Web‑apps, LAMP stack	8.4 / 10.11
Microsoft SQL Server	Commercial (Free Express)	Windows‑centric, BI	2022 CU15
Oracle Database	Commercial (Free XE)	Large‑scale transaction processing	23c
SQLite	Public‑domain (no server)	Embedded, mobile, prototyping	3.45

Market Demand (2024)

Skill	Avg Salary (US $)	Stack Overflow Survey % of Respondents
SQL / Relational DB	≈ $115 k	78 %
PostgreSQL	≈ $120 k	43 %
MySQL	≈ $112 k	55 %
SQLite (mobile/IoT)	≈ $108 k	31 %

Source: Stack Overflow Developer Survey 2024 + Glassdoor.

⚠️️ Security Reminder

Even a simple SELECT can become dangerous when user input is concatenated directly into the query. Always use **parameterised queries** (prepared statements) to avoid SQL injection.

1️⃣ Setup – Get a DB Up & Running

Option A – Native Install (PostgreSQL)

Windows – Download the installer, run, accept defaults (creates a postgres super‑user).
macOS – brew install postgresql then brew services start postgresql.
Linux (Debian/Ubuntu) – sudo apt update && sudo apt install -y postgresql postgresql-contrib.
Verify: sudo -u postgres psql -c "SELECT version();"

Option B – Lightweight SQLite (Zero‑Install)

SQLite ships with Python, PHP, and many OSes. A single file is all you need.

bash

# Verify SQLite is present
sqlite3 --version    # e.g. 3.45.0

# Create a new DB file called demo.db and open the CLI
sqlite3 demo.db

# Inside the CLI you’ll see the prompt: sqlite>

Option C – Docker (All Platforms)

Docker gives you an isolated RDBMS with a single command. No admin rights required.

docker

# Pull the official PostgreSQL image
docker pull postgres:16-alpine

# Run it (exposes 5432 to host)
docker run -d \
    --name pg_sql_demo \
    -e POSTGRES_PASSWORD=SuperSecret123 \
    -e POSTGRES_USER=demo_user \
    -e POSTGRES_DB=demo_db \
    -p 5432:5432 \
    postgres:16-alpine

# Connect from host
psql -h localhost -U demo_user -d demo_db -W   # password: SuperSecret123

SQL Client Recommendations

DBeaver – universal cross‑platform GUI (JDBC/ODBC).
pgAdmin – native PostgreSQL web‑based console.
Azure Data Studio – Microsoft’s cross‑platform query editor (works with PostgreSQL, MySQL, SQL Server).
Command‑line: psql (PostgreSQL), mysql, sqlite3.

2️⃣ First‑Look SQL – “Hello SQL” & Basic Statements

SQL (pSQL prompt) – Hello

SELECT 'Hello, SQL Mastery!' AS greeting;

Output

greeting
Hello, SQL Mastery!

Core DDL (Data Definition Language)

CREATE TABLE – define a table schema.
ALTER TABLE – add/modify columns, constraints.
DROP TABLE – delete a table.

Core DML (Data Manipulation Language)

SELECT … FROM … – read data (the “R” in CRUD).
INSERT INTO … VALUES … – add rows.
UPDATE … SET … WHERE … – modify rows.
DELETE FROM … WHERE … – remove rows.

💻 Try It Yourself - Multi-Language Compiler

Practice SQL and many other programming languages right here in your browser! Switch between languages, modify the code, and click "Run" to see results instantly.

💡 Practice Tips:

Switch to SQL in the language selector and try the database query examples
Experiment with SQL's SELECT, INSERT, UPDATE, and DELETE statements
Try other data languages like Python, R, or compare with database concepts
Use the "Load Example" button to see SQL-specific code samples
Use Ctrl+Enter to quickly run your code

Quick Reference Cheat‑Sheet

Statement	Purpose	Basic Syntax Example
CREATE TABLE	Define a new relation	`CREATE TABLE employees (id SERIAL PRIMARY KEY, name TEXT NOT NULL, dept_id INT);`
INSERT	Add a row	`INSERT INTO employees (name,dept_id) VALUES ('Alice',1);`
SELECT	Read data	`SELECT name,dept_id FROM employees WHERE dept_id = 1;`
UPDATE	Modify existing rows	`UPDATE employees SET dept_id = 2 WHERE name = 'Alice';`
DELETE	Remove rows	`DELETE FROM employees WHERE name = 'Alice';`
JOIN	Combine rows from multiple tables	`SELECT e.name, d.name FROM employees e JOIN departments d ON e.dept_id = d.id;`
GROUP BY / HAVING	Aggregate data	`SELECT dept_id, COUNT() FROM employees GROUP BY dept_id HAVING COUNT() > 5;`
ORDER BY	Sort results	`SELECT * FROM employees ORDER BY name ASC;`
LIMIT / OFFSET	Paginate results	`SELECT * FROM employees LIMIT 10 OFFSET 20;`

3️⃣ Sample Schema – Employees & Departments

Let’s build a tiny HR database that you can reuse for many hands‑on tasks.

SQL – Schema Creation (PostgreSQL)

CREATE TABLE departments (
    id   SERIAL      PRIMARY KEY,
    name VARCHAR(50) NOT NULL UNIQUE
);

CREATE TABLE employees (
    emp_id      SERIAL      PRIMARY KEY,
    first_name  VARCHAR(30) NOT NULL,
    last_name   VARCHAR(30) NOT NULL,
    email       VARCHAR(100) UNIQUE,
    hire_date   DATE        DEFAULT CURRENT_DATE,
    salary      NUMERIC(9,2) CHECK (salary > 0),
    dept_id     INT         REFERENCES departments(id) ON DELETE SET NULL
);

Insert Sample Data

SQL – Sample Rows

INSERT INTO departments (name) VALUES
    ('Engineering'), ('Marketing'), ('HR'), ('Finance');

INSERT INTO employees (first_name, last_name, email, salary, dept_id) VALUES
    ('Alice',  'Smith',   'alice.smith@example.com',  87000, 1),
    ('Bob',    'Brown',   'bob.brown@example.com',    62000, 2),
    ('Carol',  'Jones',   'carol.jones@example.com',  54000, 3),
    ('David',  'Lee',     'david.lee@example.com',   103000, 1),
    ('Eve',    'Taylor',  'eve.taylor@example.com',   77000, 4);

Visualise the Relationship (placeholder)

ER‑diagram (departments 1‑* employees) goes here

Example Queries

SQL – 5 Common Queries

-- 1️⃣ List all employees with department name
SELECT e.emp_id, e.first_name, e.last_name, d.name AS department
FROM   employees e
JOIN   departments d ON e.dept_id = d.id
ORDER BY e.emp_id;

-- 2️⃣ Average salary per department
SELECT d.name, ROUND(AVG(e.salary),2) AS avg_salary
FROM   employees e
JOIN   departments d ON e.dept_id = d.id
GROUP BY d.name
HAVING COUNT(*) > 0;

-- 3️⃣ Employees hired in the last 30 days
SELECT * FROM employees
WHERE hire_date > CURRENT_DATE - INTERVAL '30 days';

-- 4️⃣ Find employees with duplicate email domains (e.g., possible data‑quality issue)
SELECT SUBSTRING(email FROM POSITION('@' IN email)+1) AS domain, COUNT(*) AS cnt
FROM   employees
GROUP BY domain
HAVING COUNT(*) > 1;

-- 5️⃣ Delete a department (will set dept_id of its employees to NULL)
DELETE FROM departments WHERE name = 'Marketing';

4️⃣ Indexes & Query Performance Basics

Why Indexes Matter

They are data structures (usually B‑trees) that let the engine locate rows without scanning the whole table.
Typical usage: columns in WHERE, JOIN, ORDER BY, GROUP BY.
Trade‑off: faster reads, slower writes (INSERT/UPDATE) and extra storage.

Creating an Index

SQL – Create Index

CREATE INDEX idx_emp_email ON employees(email);
CREATE INDEX idx_emp_dept ON employees(dept_id);

Explain Plan – Seeing What the Optimiser Does

SQL – EXPLAIN (PostgreSQL)

EXPLAIN ANALYZE
SELECT e.first_name, e.last_name, d.name
FROM   employees e
JOIN   departments d ON e.dept_id = d.id
WHERE  e.salary > 80000;

The output shows whether the query uses an index scan, sequential scan, or a hash join, and reports actual execution time in ms.

5️⃣ Common Pitfalls – SQL Injection & Safe Coding

SQL Injection occurs when untrusted input is concatenated into a query string without sanitisation.

Vulnerable Example (PHP)

PHP – Unsafe Query

$id   = $_GET['id'];                 // e.g., ?id=1 OR 1=1
$sql  = "SELECT * FROM employees WHERE emp_id = $id";
$res  = $pdo->query($sql);

Safe Alternative – Prepared Statements

PHP – PDO Prepared

$stmt = $pdo->prepare('SELECT * FROM employees WHERE emp_id = :id');
$stmt->execute(['id' => $_GET['id']]);   // type‑safe, DB driver does the escaping
$res  = $stmt->fetchAll();

All languages have similar mechanisms (Python’s cursor.execute(..., params), Java’s PreparedStatement, .NET’s SqlCommand with parameters, etc.).

🚨 Never

Use string interpolation to build SQL.
Trust client‑side validation alone.
Store raw passwords in plain text – always hash (bcrypt, Argon2).

6️⃣ Lab Exercises (Extended)

🎯 Exercise 1.1 – Install & Verify a Database

Choose one of the three installation methods (native PostgreSQL, SQLite, Docker). Then:

Start the DB server (or open the SQLite CLI).
Run the “Hello, SQL Mastery!” query and capture a screenshot of the output.
Record the exact command you typed (including any environment variables).

🎯 Exercise 1.2 – Build the Employees Sample Schema

Using the SQL script provided above:

Create the departments and employees tables.
Insert the sample rows.
Run the “Average salary per department” query and note any department with an average > $80 k.
Write a 150‑word “lab journal” describing any errors you encountered and how you fixed them.

🎯 Exercise 1.3 – Index & Explain

For the employees.email column:

Create an index as shown earlier.
Run the same SELECT … WHERE email = 'alice.smith@example.com' query with the index and capture the EXPLAIN ANALYZE output.
Delete the index (DROP INDEX idx_emp_email;) and re‑run the query – compare timings.
Summarise (≤ 100 words) why the index helped (or didn’t) in this case.

🎯 Exercise 1.4 – Safe Coding Mini‑Project (any language)

Write a short program (≤ 50 lines) that:

Accepts a user‑supplied employee ID from the console.
Queries the employees table for that ID using a prepared statement.
Prints the employee’s full name and department.

Include the source code and a screenshot of a successful run.

7️⃣ Knowledge‑Check Quiz (5 questions)

Which SQL clause is used to combine rows from two tables?
GROUP BY
JOIN
UNION
What does the SERIAL data type do in PostgreSQL?
Stores binary data.
Auto‑increments an integer and creates a sequence.
Stores a fixed-length string of 20 characters.
True or False – DROP TABLE can be rolled back inside a transaction in PostgreSQL.
Which of the following is a **safe** way to guard against SQL injection?
Escape single quotes manually.
Use a prepared statement / parameterised query.
Validate the input length only.
What is the purpose of an INDEX?

Click to reveal the answers.

Show Answers

B – JOIN combines rows from two tables based on a condition.
B – SERIAL creates an auto‑increment integer backed by a sequence.
False – Dropping a table is DDL and cannot be rolled back in PostgreSQL (unless using a schema‑level transaction in recent versions, but standard practice treats it as non‑reversible).
B – Parameterised queries are the industry‑standard defense.
An INDEX provides a fast lookup mechanism (typically a B‑tree) that reduces the amount of data scanned for queries that filter, sort, or join on the indexed column(s).

8️⃣ Further Reading & Resources

Books
- SQL for Beginners – John J. L. (O'Reilly, 2022)
- PostgreSQL Administration – Simon Riggs & Gianni Ciolli (2nd ed., 2023)
- SQL Antipatterns – Bill Karwin (Addison‑Wesley, 2010)
- Database System Concepts – Silberschatz, Korth & Sudarshan (7th ed., 2020)
Online Tutorials
Reference Docs
- SQL‑92 Standard – ISO/IEC 9075‑2:1992
- PostgreSQL SQL‑Syntax – PostgreSQL Docs
- SQLite Quickstart – sqlite.org
Practice Platforms

Lecture 02 · Fundamentals

Basic SELECT Queries

1️⃣ Retrieving Data – The Core of SQL

The SELECT statement is the primary read‑only operation in a relational database. It defines **what** columns you want, **from which** tables, and optionally how the rows should be filtered, ordered, grouped, or limited.

SELECT name, price FROM products;
SELECT * FROM employees; -- all columns (discouraged in production)

⚠️ Why SELECT * Is Discouraged

Pulls every column – can cause unnecessary I/O and network traffic.
Masks schema changes (new columns appear silently in result sets).
Exposes sensitive columns (password hashes, internal IDs) if you forget to prune them.

2️⃣ Aliases – Giving Columns & Tables Friendly Names

Aliases improve readability, especially in joins or when column names clash.

SELECT first_name AS name FROM users;
SELECT u.first_name AS fname,
       u.last_name  AS lname FROM users AS u;

Table aliases also let you qualify columns (t1.id) which is mandatory when two tables share a column name.

3️⃣ Filtering Rows – `WHERE` Clause

SELECT name, price
FROM products
WHERE price > 100 AND category = 'Electronics';

Common comparison operators: =, <>, >, <, >=, <= and set operators IN, NOT IN. Use LIKE for pattern matching.

SELECT *
FROM employees
WHERE email LIKE '%@example.com';   -- ends with @example.com

4️⃣ Ordering & Limiting – `ORDER BY`, `LIMIT`/`OFFSET`

SELECT name, price
FROM products
WHERE price > 0
ORDER BY price DESC
LIMIT 10 OFFSET 20;   -- pagination: page 3 (0‑based)

Use ASC (default) for ascending order. OFFSET is optional – without it, LIMIT 10 simply returns the first ten rows.

5️⃣ Removing Duplicates – `DISTINCT`

SELECT DISTINCT department_id FROM employees;

DISTINCT works on the whole projected row. If you need distinct on a single column but keep other columns, use a GROUP BY (see next section).

6️⃣ Aggregates & Grouping – Summaries & Statistics

Function	Description	Example
`COUNT(col)`	Number of non‑NULL values	`SELECT COUNT(*) FROM orders;`
`SUM(col)`	Total of numeric column	`SELECT SUM(amount) FROM sales;`
`AVG(col)`	Mean value	`SELECT AVG(salary) FROM employees;`
`MIN(col)`	Smallest value	`SELECT MIN(price) FROM products;`
`MAX(col)`	Largest value	`SELECT MAX(price) FROM products;`

To obtain aggregates **per group**, combine them with GROUP BY:

SELECT department_id,
       COUNT(*)                AS employee_cnt,
       AVG(salary)             AS avg_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;   -- keep only large departments

HAVING filters groups after aggregation (equivalent to WHERE for rows).

7️⃣ Joins – Bringing Data Together

Four basic join types (inner, left/right outer, full outer, cross). The most common is INNER JOIN.

SELECT e.first_name,
       e.last_name,
       d.name AS department
FROM employees AS e
JOIN departments AS d ON e.department_id = d.id
WHERE d.name = 'Engineering'
ORDER BY e.last_name;

For left/right outer join:

SELECT c.name AS customer,
       o.id   AS order_id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
ORDER BY c.name;

Use FULL OUTER JOIN when you need rows that appear in *either* table (PostgreSQL supports it; MySQL does not natively).

8️⃣ Set Operators – Combining Result Sets

SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers;   -- eliminates duplicates

SELECT email FROM customers
INTERSECT
SELECT email FROM newsletter_subscribers;   -- only common emails

SELECT email FROM customers
EXCEPT
SELECT email FROM newsletter_subscribers;   -- in customers but not in subscribers

All participating SELECTs must have the **same column count** and **compatible data types**.

9️⃣ Sub‑queries – Nesting SELECTs

Inline (derived table)

SELECT dept_id, avg_salary
FROM (
    SELECT department_id AS dept_id,
           AVG(salary)    AS avg_salary
    FROM employees
    GROUP BY department_id
) AS dept_avg
WHERE avg_salary > 80000;

Correlated sub‑query (depends on outer row)

SELECT e.id, e.first_name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) FROM employees
    WHERE department_id = e.department_id
);

Correlated sub‑queries can be slower; many RDBMS transform them into joins automatically.

10️⃣ Window Functions – Row‑Level Analytics

SELECT employee_id,
       salary,
       AVG(salary) OVER (PARTITION BY department_id) AS dept_avg,
       RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees
ORDER BY department_id, salary DESC;

Window functions keep the original row granularity while adding a calculated column (e.g., running total, moving average, rank).

11️⃣ Performance Tips – `EXPLAIN` and Indexes

Before writing a complex SELECT, inspect the execution plan:

EXPLAIN ANALYZE
SELECT e.first_name, e.last_name, d.name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Engineering'
ORDER BY e.last_name
LIMIT 50;

The output tells you whether an **index scan**, **sequential scan**, or **hash join** is used. If you see “Seq Scan” on a large table with a WHERE clause on an indexed column, create an index:

CREATE INDEX idx_emp_dept ON employees(department_id);
CREATE INDEX idx_dept_name ON departments(name);

Remember: each index speeds reads but slows INSERT/UPDATE/DELETE and consumes storage.

12️⃣ Safety – Avoiding SQL Injection in SELECTs

Even a read‑only query can be hijacked to return data you never intended. Always use **parameterised queries** / **prepared statements** in application code.

Python (psycopg2) – Safe SELECT

import psycopg2

conn = psycopg2.connect(dsn='dbname=mydb user=app')
cur  = conn.cursor()
cur.execute(
    "SELECT name, price FROM products WHERE price > %s AND category = %s;",
    (100, 'Electronics')
)
rows = cur.fetchall()
for r in rows:
    print(r)

Never concatenate user‑input directly into the SQL string.

13️⃣ Cheat‑Sheet – SELECT Syntax Overview

Clause	Purpose	Typical Syntax
`SELECT`	Columns / expressions to return	`SELECT col1, col2, …`
`FROM`	Source table(s) / subquery	`FROM table_name [AS t]`
`WHERE`	Row filter	`WHERE condition`
`GROUP BY`	Aggregate grouping	`GROUP BY col1, col2`
`HAVING`	Filter on aggregated groups	`HAVING aggregate_condition`
`ORDER BY`	Sort result set	`ORDER BY col [ASC\|DESC]`
`LIMIT / OFFSET`	Pagination	`LIMIT 10 OFFSET 20`
`DISTINCT`	Remove duplicate rows	`SELECT DISTINCT col FROM …`
`JOIN`	Combine tables	`JOIN other_table ON condition`
`UNION / INTERSECT / EXCEPT`	Set algebra	`SELECT … UNION SELECT …`
`WINDOW`	Analytic functions	`OVER (PARTITION BY … ORDER BY …)`

14️⃣ Sample Schema – Customers, Orders, Products

ER‑diagram (Customers ←→ Orders → Products) placeholder

15️⃣ Lab Exercises (Extended)

🎯 Exercise 2.1 – Build the Sample Schema

Using the diagram above, create three tables in PostgreSQL (or SQLite) with appropriate primary/foreign keys. Then insert at least 10 rows per table (use any fictional data you like).

🎯 Exercise 2.2 – Basic Queries

List all products cheaper than $50, ordered by price ascending.
Show the total amount each customer has spent (use SUM and GROUP BY).
Find customers who placed more than 3 orders (use HAVING).
Display the top 5 most‑expensive orders (use ORDER BY and LIMIT).

🎯 Exercise 2.3 – Joins & Sub‑queries

Show each order with the customer’s name and the product’s name (inner join across three tables).
Find products that have never been ordered (left join + WHERE order_id IS NULL).
Write a correlated sub‑query that returns customers whose total spend is above the **average** spend of all customers.

🎯 Exercise 2.4 – Window Functions

For each order, compute the running total per customer (order rows sorted by order_date) using SUM() OVER (PARTITION BY … ORDER BY …). Capture the query and a screenshot of the result.

16️⃣ Knowledge‑Check Quiz (7 questions)

17️⃣ Further Reading & Resources

Books
- SQL Fundamentals – John J. L. (O'Reilly, 2022)
- PostgreSQL Up and Running – Regina O. Obe (3rd ed., 2023)
- SQL Antipatterns – Bill Karwin (Addison‑Wesley, 2010)
- Mastering SQL Queries – Mark S. (Packt, 2021)
Official Docs
Interactive Practice

Lecture 03 · Fundamentals

Filtering & Sorting

1️⃣ The `WHERE` Clause – Selecting the Right Rows

WHERE is the gate‑keeper of a SELECT statement. Every row that satisfies the Boolean expression after WHERE is passed downstream to ORDER BY, LIMIT, etc.

1.1 Comparison & Logical Operators

Operator	Description	Example
`=`	Equality	`price = 199.99`
`<> or !=`	Inequality	`category <> 'Books'`
`> < >= <=`	Numeric / date comparison	`age >= 21`
`BETWEEN … AND …`	Inclusive range	`price BETWEEN 50 AND 150`
`IN ( … )`	Set membership	`status IN ('active','pending')`
`LIKE / ILIKE`	Pattern matching (wildcards `%` and `_`)	`email LIKE '%@example.com'`
`IS NULL / IS NOT NULL`	Null test (cannot use `= NULL`)	`last_login IS NULL`
`AND / OR / NOT`	Boolean combinators	`country = 'USA' AND age >= 18`
`EXISTS (sub‑query)`	True if sub‑query returns at least one row	`WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)`
`ANY / ALL`	Compare to a set of values	`salary > ANY (SELECT salary FROM employees WHERE department = 'HR')`

1.2 Practical `WHERE` Samples

SELECT *
FROM products
WHERE price > 100
  AND stock >= 10
  AND category IN ('Electronics','Appliances')
  AND discontinued IS NULL;

SELECT *
FROM users
WHERE country = 'USA'
  AND age >= 18
  AND email LIKE '%@example.com';

SELECT *
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';

SELECT *
FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d WHERE d.id = e.department_id
    AND d.name = 'Engineering'
);

⚠️ Performance Tip

**Never** wrap an indexed column in a function in the WHERE clause (e.g., WHERE UPPER(name) = 'ALICE') – this defeats the index. Use a functional index or a case‑insensitive collation instead.

2️⃣ Ordering Results – `ORDER BY`

2.1 Basic Syntax

Sort by one or more columns, each optionally marked ASC (default) or DESC.

SELECT *
FROM products
ORDER BY price DESC;   -- expensive first

SELECT *
FROM employees
ORDER BY department_id ASC, last_name ASC;

2.2 Controlling NULL Placement

Some engines put NULL values first, others last. Use the explicit NULLS FIRST or NULLS LAST clause.

SELECT *
FROM users
ORDER BY last_login DESC NULLS LAST;

2.3 Case‑Insensitive / Locale‑Aware Sorting

PostgreSQL lets you attach a collation directly in the ORDER BY clause:

SELECT name
FROM customers
ORDER BY name COLLATE \"C\";   -- case‑insensitive, language‑aware

2.4 Ordering by an Expression (Custom Sort)

Advanced use‑cases often need a derived value, e.g., sorting by the length of a string:

SELECT product_name, price
FROM products
ORDER BY LENGTH(product_name) ASC, price DESC;

2.5 Random Ordering (caution!)

For testing or sampling you can randomise rows, but it forces a full‑table sort and is expensive on large tables.


SELECT *
FROM quotes
ORDER BY RANDOM()
LIMIT 1;


SELECT *
FROM quotes
ORDER BY RAND()
LIMIT 1;

3️⃣ Pagination – `LIMIT` & `OFFSET`

Web applications often need “pages” of data. The canonical pattern is:

SELECT *
FROM products
ORDER BY product_id
LIMIT 20          -- page size
OFFSET 40;        -- skip first two pages (0‑based)

Tip: Combine pagination with a stable ordering column (e.g., primary key) to get deterministic results.

Keyset Pagination (Seek Method)

When a table grows, OFFSET becomes slower. Use “keyset pagination”:

SELECT *
FROM products
WHERE product_id > 120   -- last ID from previous page
ORDER BY product_id
LIMIT 20;

4️⃣ Performance – How Filtering & Sorting Interact with Indexes

4.1 Index Usage Basics

A B‑tree index can satisfy both WHERE predicates and ORDER BY if the sort order matches the index definition.
Example: CREATE INDEX idx_price ON products(price DESC); lets the engine fetch rows already sorted – eliminating a separate sort phase.

4.2 EXPLAIN – Visualising the Plan

EXPLAIN ANALYZE
SELECT *
FROM products
WHERE price > 100
ORDER BY price DESC
LIMIT 10;

The output shows whether the planner uses an Index Scan (good) or a Seq Scan + Sort (costly).

4.3 Common Pitfalls

Applying a function to an indexed column (WHERE LOWER(email) = 'alice@example.com') disables index usage. Instead create a functional index: CREATE INDEX idx_email_lower ON users (LOWER(email));
Using OR across unrelated columns often prevents the optimizer from using indexes. Split the query with UNION ALL if possible.
Sorting on a calculated column (e.g., ORDER BY price * tax_rate) forces a temporary file sort.

5️⃣ Security – Safe Filtering & Sorting

Even read‑only queries can be abused for data exfiltration if user input is concatenated unchecked.

Node.js (pg) – Parameterised SELECT

const { Client } = require('pg');
const client = new Client(/* connection params */);

async function getProducts(minPrice, maxPrice, sortCol, sortDir) {
    // whitelist column names & sort direction to prevent injection
    const allowedCols = ['price', 'name', 'created_at'];
    const allowedDir = ['ASC', 'DESC'];

    if (!allowedCols.includes(sortCol) || !allowedDir.includes(sortDir)) {
        throw new Error('Invalid sort parameters');
    }

    const sql = `SELECT *
                 FROM products
                 WHERE price BETWEEN $1 AND $2
                 ORDER BY ${sortCol} ${sortDir}
                 LIMIT 20`;
    const res = await client.query(sql, [minPrice, maxPrice]);
    return res.rows;
}`;

**Key take‑aways**:

Never interpolate raw user input directly into SQL (even for ORDER BY).
Whitelist column names / directions, or map them to an enum.
Use prepared statements for values ($1, $2 in PostgreSQL, ? in MySQL).

6️⃣ Cheat‑Sheet – Filtering & Sorting Summary

Clause	Purpose	Typical Syntax
`WHERE`	Row filtering	`WHERE col = 5 AND other < 10`
`BETWEEN … AND …`	Inclusive range	`WHERE price BETWEEN 20 AND 30`
`IN ( … )`	Set membership	`WHERE status IN ('open','closed')`
`LIKE / ILIKE`	Pattern matching	`WHERE name LIKE 'A%'`
`IS NULL / IS NOT NULL`	Null test	`WHERE deleted_at IS NULL`
`EXISTS (sub‑query)`	True if sub‑query returns rows	`WHERE EXISTS (SELECT 1 …)`
`ORDER BY`	Sort result set	`ORDER BY col1 DESC, col2 ASC`
`NULLS FIRST / LAST`	Explicit null ordering	`ORDER BY created_at DESC NULLS LAST`
`LIMIT / OFFSET`	Pagination	`LIMIT 20 OFFSET 40`
`CASE … END`	Custom sort order	`ORDER BY CASE status WHEN 'high' THEN 1 … END`

7️⃣ Sample Schema – Customers, Orders, Products

ER‑diagram (Customers ←→ Orders ←→ Products) placeholder

8️⃣ Lab Exercises (Hands‑On)

🎯 Exercise 3.1 – Filtering Challenge

Using the sample schema (create customers, orders, products tables with at least 20 rows each):

Find all orders placed in the last 30 days with a total amount > $200.
List customers whose email ends with @example.org **and** who have never placed an order.
Show products that have never been ordered (hint: LEFT JOIN + WHERE order_id IS NULL).
Retrieve the top 5 most‑expensive products, ordered by price descending.

Capture the SQL statements and a screenshot of each query’s result.

🎯 Exercise 3.2 – Sorting & Pagination

Write a query that sorts customers by last_name ascending, then first_name descending.
Implement keyset pagination for the orders table (use order_id as the cursor).
Compare the execution time of the same query using OFFSET/LIMIT vs. keyset pagination on a table with 10 000 rows.
Explain why the keyset method is faster on large offsets.

🎯 Exercise 3.3 – Index Impact

On the products table:

Create a B‑tree index on price DESC.
Run the query SELECT * FROM products WHERE price > 500 ORDER BY price DESC LIMIT 20; and capture the EXPLAIN ANALYZE output before and after the index.
Summarise (≤ 100 words) how the index changed the plan and execution time.

9️⃣ Knowledge‑Check Quiz (7 questions)

10️⃣ Further Reading & Resources

Books
- SQL Performance Explained – Markus Winand (2nd ed., 2021)
- SQL Antipatterns – Bill Karwin (Addison‑Wesley, 2010)
- PostgreSQL Up and Running – Regina O. Obe (3rd ed., 2023)
Official Docs
Interactive Practice

Lecture 04 · Fundamentals

Joins – Combining Data From Multiple Tables

0️⃣ Why Joins Matter

Real‑world databases are normalized: data is split across related tables to avoid redundancy. A join stitches those pieces back together so a security analyst (or any data‑consumer) can answer questions such as:

Which user bought the most‑expensive product?
What is the total revenue per country?
Are there any “orphaned” orders without a matching user?

1️⃣ Sample Schema (Orders, Users, Products, Order_Items)

ER diagram (Users ↔ Orders ↔ Order_Items ↔ Products) placeholder

Below is the DDL you can paste into any RDBMS (PostgreSQL/MySQL/SQLite) to create the demo data.

schema.sql

CREATE TABLE users (
    id          SERIAL      PRIMARY KEY,
    name        VARCHAR(50)   NOT NULL,
    email       VARCHAR(100)  UNIQUE,
    country     VARCHAR(50)
);

CREATE TABLE products (
    id          SERIAL      PRIMARY KEY,
    sku         VARCHAR(30)   UNIQUE,
    name        VARCHAR(100)  NOT NULL,
    price       NUMERIC(9,2) NOT NULL
);

CREATE TABLE orders (
    id          SERIAL      PRIMARY KEY,
    user_id     INT         REFERENCES users(id),
    order_date  TIMESTAMP    DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    id          SERIAL      PRIMARY KEY,
    order_id    INT         REFERENCES orders(id),
    product_id  INT         REFERENCES products(id),
    quantity    INT         DEFAULT 1
);

2️⃣ Join Types – Theory (with Venn diagrams)

Join Type	Result Set	Typical Syntax
INNER JOIN	Only rows that have matching keys in both tables.	`SELECT … FROM A INNER JOIN B ON A.id = B.fk`
LEFT (OUTER) JOIN	All rows from the left table + matching rows from the right; NULL where no match.	`SELECT … FROM A LEFT JOIN B ON A.id = B.fk`
RIGHT (OUTER) JOIN	All rows from the right table + matching rows from the left; NULL where no match.	`SELECT … FROM A RIGHT JOIN B ON A.id = B.fk`
FULL (OUTER) JOIN	All rows from both tables; NULLs where one side is missing.	`SELECT … FROM A FULL OUTER JOIN B ON A.id = B.fk`
CROSS JOIN	Cartesian product (every row of A paired with every row of B). Use with care!	`SELECT … FROM A CROSS JOIN B`
NATURAL JOIN	Implicit join on all columns with the same name. Rarely used in security reporting – can hide bugs.	`SELECT … FROM A NATURAL JOIN B`

Below are simple Venn‑diagram placeholders for each join type. Replace them with real SVG/PNG graphics in your LMS.

Join	Diagram
INNER	Venn – overlapping area only
LEFT	All of left + overlap
RIGHT	All of right + overlap
FULL	Entire left & right circles
CROSS	Cartesian grid

3️⃣ Practical Join Syntax – Variations & Shortcuts

3.1 Using `USING` (when column names match)

SELECT o.id, u.name
FROM orders o
INNER JOIN users u USING(user_id);   -- automatically matches on o.user_id = u.id

3.2 Table Aliases – make queries readable

SELECT o.id AS order_id,
       u.name AS buyer,
       p.name AS product,
       oi.quantity
FROM order_items oi
JOIN orders o   ON oi.order_id   = o.id
JOIN users u    ON o.user_id    = u.id
JOIN products p ON oi.product_id = p.id
WHERE p.price > 1000
ORDER BY o.id;

3.3 Self‑Join – hierarchical data (e.g., employee‑manager)

SELECT e.name AS employee,
       m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
ORDER BY e.name;

3.4 Joining More Than Two Tables

The same JOIN keyword can be chained; the DB engine builds an internal join tree.

SELECT u.name,
       SUM(p.price * oi.quantity) AS total_spent
FROM users u
JOIN orders o       ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p    ON oi.product_id = p.id
GROUP BY u.name
HAVING total_spent > 5000
ORDER BY total_spent DESC;

4️⃣ Common Pitfalls & Performance Tips

Cartesian product – forgetting the ON clause (or using CROSS JOIN) creates huge result sets that can crash the DB or hide the real problem.
Duplicate rows – when joining a one‑to‑many relationship, SELECT DISTINCT may be needed, but better to aggregate (GROUP BY) rather than hide the symptom.
Missing indexes on foreign‑key columns – each JOIN ideally uses an index on the joining column (e.g., CREATE INDEX idx_orders_user_id ON orders(user_id);).
Ordering before joining – give the optimizer the chance to push ORDER BY down to the smallest sub‑plan. Usually you order after the final join.
Implicit joins (comma‑separated FROM) – legacy syntax FROM a, b WHERE a.id=b.id is harder to read and more error‑prone.

4.1 How the Optimizer Builds a Join Tree

Modern engines (PostgreSQL, MySQL, SQL‑Server) evaluate joins in the order that yields the lowest estimated cost. You can view the plan with EXPLAIN (or EXPLAIN ANALYZE for real timing).

EXPLAIN ANALYZE
SELECT u.name, o.id, p.name, oi.quantity
FROM users u
JOIN orders o       ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p    ON oi.product_id = p.id
WHERE p.price > 1000
LIMIT 20;

The output will show whether each join is using an Index Scan, a Hash Join, or a Nested Loop. If you see a Seq Scan on a large table where a foreign key exists, add an index and re‑run.

5️⃣ Hands‑On Lab – Join Challenges

🎯 Exercise 4.1 — Build the Sample Database

Import schema.sql (provided above) into your favourite RDBMS. Populate it with at least the following rows (feel free to add more):

Table	Sample Rows
users	INSERT INTO users (name,email,country) VALUES ('Alice','alice@example.com','USA'), ('Bob','bob@example.com','India'), ('Charlie','charlie@example.com','UK');
products	INSERT INTO products (sku,name,price) VALUES ('SKU-001','Gaming Laptop',2500.00), ('SKU-002','Office Chair',150.00), ('SKU-003','USB‑C Hub',45.99);
orders	INSERT INTO orders (user_id,order_date) VALUES (1,'2024-04-01'),(2,'2024-04-03'),(1,'2024-04-05');
order_items	INSERT INTO order_items (order_id,product_id,quantity) VALUES (1,1,1),(1,3,2), (2,2,1), (3,1,1),(3,2,1);

🎯 Exercise 4.2 — Write Five Queries

Inner join – List every order (order_id) together with the buyer’s name and the total amount for that order (sum of price × quantity).
Left join – Show all users and the list of orders they placed; include users with zero orders (display NULL for order_id).
Self‑join – Assuming the users table had a referrer_id column (who invited them), write a query that shows each user and the name of the referrer. (You may add a dummy column to practice.)
Cross join (Cartesian product) – caution – Show the total number of possible user × product pairs. Then limit the output to the first 5 rows to avoid flooding the console.
Full outer join (PostgreSQL only) – Return every user and every order, even if there is no matching pair (use FULL OUTER JOIN). Identify “orphaned” orders (no user) and “orphaned” users (no order) in the same result set.

Take a screenshot of each query’s result and include it in a short lab‑report (max 300 words) describing what you learned.

🎯 Exercise 4.3 — Index Performance Test

Run the inner‑join revenue query from Exercise 4.2 on a table with at least 10 000 rows (you can generate dummy data with a script). Capture the EXPLAIN ANALYZE output before adding indexes on orders.user_id, order_items.order_id and order_items.product_id. Then add the indexes, re‑run the same EXPLAIN ANALYZE and compare the total execution time.

Write a 150‑word summary explaining the difference.

6️⃣ Cheat‑Sheet – Join Syntax at a Glance

Syntax	When to Use
`FROM A INNER JOIN B ON A.id = B.fk`	Only records with matching keys (most common).
`FROM A LEFT JOIN B ON A.id = B.fk`	All rows from A; keep B data where it exists (good for “orphan detection”).
`FROM A RIGHT JOIN B ON A.id = B.fk`	All rows from B; rare in practice (swap tables for LEFT JOIN).
`FROM A FULL OUTER JOIN B ON A.id = B.fk`	Show every row from both tables – good for data‑reconciliation.
`FROM A CROSS JOIN B`	Cartesian product; use only for combinatorial generation (e.g., all seat‑price combos).
`FROM A JOIN B USING(col)`	When column names are identical – less typing.
`FROM A NATURAL JOIN B`	Implicit join on all same‑named columns – avoid unless you fully control schema.

7️⃣ Knowledge‑Check Quiz (7 questions)

Which join returns every row from the left table and NULLs for the right table when there is no match?
INNER JOIN
LEFT JOIN
FULL OUTER JOIN
What is the result of a CROSS JOIN between a table with 5 rows and another with 8 rows?
13 rows
40 rows
5 rows
True or False – NATURAL JOIN is safe to use in a security‑report because it guarantees you are joining on the right columns.
Which clause can be used to eliminate duplicate rows after a join?
WHERE
GROUP BY
DISTINCT
You have an orders table (order_id, user_id) and a users table (id, name). Which index will most improve the performance of the following query?
```
SELECT o.id, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.country = 'USA';
```
Index on orders.id
Index on users.country
Composite index on orders(user_id)
Which of the following statements is incorrect about LEFT JOIN semantics?
The left table’s rows are never filtered out.
NULLs appear for right‑table columns when there is no match.
LEFT JOIN is equivalent to RIGHT JOIN with the table order swapped.
In PostgreSQL, which command displays the actual join execution plan with timing information?
EXPLAIN
EXPLAIN ANALYZE
SHOW PLAN

Show Answers

B – LEFT JOIN keeps all rows from the left side.
B – 5 × 8 = 40 rows (Cartesian product).
False – NATURAL JOIN implicitly matches on *all* same‑named columns; a stray column with the same name can cause an unintended join.
C – DISTINCT removes duplicate rows after a join.
C – A composite index on orders(user_id) (or a separate index on users.id, which is already a primary key) will help the join; the filter on users.country is secondary.
C – LEFT JOIN is not the same as RIGHT JOIN simply by swapping order; the side that receives NULLs matters.
B – EXPLAIN ANALYZE runs the query and prints real execution times.

8️⃣ Further Reading & Templates

Books
- SQL Antipatterns – Bill Karwin (covers misuse of joins).
- Learning SQL – Alan Beaulieu (excellent for beginners).
- SQL Performance Explained – Markus Winand (deep dive on join optimization).
Online Guides
Cheat‑Sheet PDF (download) SQL Join Cheat‑Sheet (PDF)

Lecture 05 · Fundamentals

Aggregation & `GROUP BY`

0️⃣ Why Aggregation Is a Core Skill

Aggregation turns rows into business‑level numbers. In a penetration‑test report you’ll often need to say:

“142 hosts responded on port 22.”
“The average CVSS score of all critical findings is 9.2.”
“Total data exfiltrated: 4.3 GB.”

All of those statements are produced with COUNT(), SUM(), AVG() and friends, combined with GROUP BY.

1️⃣ Sample Schema (Orders ↔ Users ↔ Products)

ER‑diagram (Users ↔ Orders ↔ Order_Items ↔ Products) placeholder

Copy‑paste the DDL below into any RDBMS (PostgreSQL, MySQL or SQLite) and populate it with a few rows – you’ll need it for all the lab exercises.

schema.sql

CREATE TABLE users (
    id      SERIAL      PRIMARY KEY,
    name    VARCHAR(50)   NOT NULL,
    country VARCHAR(30)
);

CREATE TABLE products (
    id    SERIAL      PRIMARY KEY,
    sku   VARCHAR(30)   UNIQUE,
    name  VARCHAR(100)  NOT NULL,
    price NUMERIC(9,2) NOT NULL,
    stock INT         DEFAULT 0
);

CREATE TABLE orders (
    id         SERIAL      PRIMARY KEY,
    user_id    INT         REFERENCES users(id),
    order_date TIMESTAMP    DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    id         SERIAL      PRIMARY KEY,
    order_id   INT         REFERENCES orders(id),
    product_id INT         REFERENCES products(id),
    qty        INT         DEFAULT 1
);

2️⃣ Core Aggregate (Summary) Functions

Function	What It Does	Typical Example
`COUNT(*)`	Counts rows (includes NULLs)	`SELECT COUNT(*) FROM products;`
`COUNT(col)`	Counts non‑NULL values of `col`	`SELECT COUNT(email) FROM users;`
`SUM(col)`	Adds all numeric values	`SELECT SUM(stock) FROM products;`
`AVG(col)`	Average (arithmetic mean)	`SELECT AVG(price) FROM products;`
`MIN(col)`	Smallest value	`SELECT MIN(order_date) FROM orders;`
`MAX(col)`	Largest value	`SELECT MAX(price) FROM products;`
`STRING_AGG(col, ', ')` (Postgres) / `GROUP_CONCAT(col SEPARATOR ', ')` (MySQL)	Concatenates strings per group	`SELECT country, STRING_AGG(name, ', ') FROM users GROUP BY country;`
`JSON_AGG(col)` (Postgres)	Returns a JSON array per group	`SELECT JSON_AGG(row_to_json(p)) FROM products p GROUP BY category;`

All aggregate functions ignore NULL values automatically (except COUNT(*) which counts rows regardless of NULL).

3️⃣ Basic `GROUP BY` Syntax

SELECT category,
       COUNT(*)            AS product_cnt,
       AVG(price)          AS avg_price,
       SUM(stock * price) AS inventory_value
FROM products
GROUP BY category
ORDER BY inventory_value DESC;

Rules to remember:

Every column in the SELECT list must either be an aggregate or appear in the GROUP BY clause.
Aliases (AS …) are optional but make later HAVING or ORDER BY clauses clearer.
The grouping columns are evaluated first; aggregates are then calculated per group.

4️⃣ Filtering Groups – `HAVING`

HAVING works like WHERE but **after** aggregation.

SELECT country,
       COUNT(*)            AS user_cnt
FROM users
GROUP BY country
HAVING COUNT(*) >= 5
ORDER BY user_cnt DESC;

Typical security‑report use‑cases:

Show only assets that have **≥ 3** high‑severity findings.
List countries where the *average* CVSS of discovered vulns exceeds 7.

5️⃣ Advanced Grouping – `ROLLUP`, `CUBE`, `GROUPING SETS`

These constructs let you produce **multiple aggregation levels** in a single query.

5.1 `ROLLUP` – Hierarchical Sub‑Totals

SELECT u.country,
       EXTRACT(YEAR FROM o.order_date) AS yr,
       SUM(p.price * oi.qty) AS revenue
FROM orders o
JOIN users u           ON o.user_id = u.id
JOIN order_items oi  ON o.id = oi.order_id
JOIN products p       ON oi.product_id = p.id
GROUP BY ROLLUP(u.country, EXTRACT(YEAR FROM o.order_date))
ORDER BY u.country, yr;

Result rows include:

Revenue per (country, year).
Yearly subtotal per country (country, NULL).
Grand total (both columns NULL).

5.2 `CUBE` – All Possible Combinations

SELECT u.country,
       EXTRACT(YEAR FROM o.order_date) AS yr,
       COUNT(DISTINCT o.id) AS orders_cnt
FROM orders o
JOIN users u ON o.user_id = u.id
GROUP BY CUBE(u.country, EXTRACT(YEAR FROM o.order_date))
ORDER BY u.country, yr;

CUBE returns every combination, i.e., per country, per year, per country‑year, plus the grand total.

5.3 `GROUPING SETS` – Hand‑Picked Mix

SELECT u.country,
       EXTRACT(MONTH FROM o.order_date) AS month,
       SUM(p.price * oi.qty) AS rev
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
GROUP BY GROUPING SETS(
    (u.country, EXTRACT(MONTH FROM o.order_date)),
    (u.country),
    (EXTRACT(MONTH FROM o.order_date))
)
ORDER BY u.country, month;

Only the three specified groupings appear – no grand total.

6️⃣ Window (Analytic) Functions – Aggregates Without Collapse

Sometimes you need a running total or a rank while still seeing each individual order. Window functions let you do that.

6.1 Syntax Overview

SELECT o.id,
       u.name,
       p.name AS product,
       oi.qty,
       p.price,
       SUM(p.price * oi.qty) OVER(
           PARTITION BY u.id
           ORDER BY o.order_date
       ) AS cumulative_spend
FROM orders o
JOIN users u          ON o.user_id = u.id
JOIN order_items oi   ON o.id = oi.order_id
JOIN products p       ON oi.product_id = p.id
ORDER BY u.id, o.order_date;

6.2 Real‑World Example – Top‑3 Products per Country

SELECT u.country,
       p.name AS product,
       SUM(p.price * oi.qty) AS revenue,
       RANK() OVER(PARTITION BY u.country
                     ORDER BY SUM(p.price * oi.qty) DESC) AS rnk
FROM order_items oi
JOIN orders o      ON oi.order_id = o.id
JOIN users u       ON o.user_id = u.id
JOIN products p    ON oi.product_id = p.id
GROUP BY u.country, p.name
HAVING SUM(p.price * oi.qty) > 0
ORDER BY u.country, rnk
LIMIT 30;   -- 10 countries × top‑3 each

Window functions are **not** listed in GROUP BY; they are calculated after the grouping step.

7️⃣ Performance Tips for Aggregations

Index the grouping columns. An index on products.category speeds GROUP BY category.
Covering indexes (e.g., CREATE INDEX idx_products_cat_price ON products(category, price);) let the engine satisfy both the grouping and the aggregation from the index alone.
Materialized views for heavy daily reports (e.g., CREATE MATERIALIZED VIEW daily_sales AS …).
Filter early. Put restrictive WHERE clauses before GROUP BY to reduce rows that need to be aggregated.
Avoid SELECT * in aggregated queries – fetching unnecessary columns inflates I/O.
Watch for data skew – a single value dominating a group can cause sub‑optimal join plans. Consider partitioning or separate statistics.

7.1 Index Impact Demo

EXPLAIN ANALYZE
SELECT category,
       COUNT(*)       AS cnt,
       AVG(price)   AS avg_price
FROM products
GROUP BY category;

If the plan shows a Seq Scan on a million‑row table, create an index:

CREATE INDEX idx_products_category ON products(category);

Rerun the EXPLAIN ANALYZE – you should see an Index Scan and a dramatically lower total cost.

8️⃣ Common Pitfalls & How to Avoid Them

Missing GROUP BY clause – “column must appear in the GROUP BY clause” error.
Counting with COUNT(col) when you need total rows – COUNT(col) skips NULLs.
Using HAVING without GROUP BY – it works but usually means you meant WHERE.
Accidentally creating a Cartesian product – forgetting the ON clause in a join will cause GROUP BY to process massive rowsets.
NULL groups – rows where the grouping column is NULL become a separate group. Use COALESCE(col,'N/A') if you prefer them merged.
Rounding errors with monetary aggregates – cast to a higher precision type (e.g., price::numeric(12,4)) before summing.

9️⃣ Hands‑On Lab – Aggregation Challenges

🎯 Exercise 5.1 — Load the Sample Schema

Run schema.sql (Section 1). Insert at least 20 products spanning 4 categories, 15 users across 3 countries, and 30 orders with associated order_items. Feel free to use any data you like – the point is to have enough rows for meaningful grouping.

🎯 Exercise 5.2 — Write Five Aggregation Queries

Total inventory value per category – SUM(price * stock) grouped by category.
Average order value per country – join the four tables and AVG(price * qty) grouped by users.country.
Top‑3 selling products overall – order by SUM(qty) descending, LIMIT 3.
Countries with more than 5 orders – use GROUP BY users.country and HAVING COUNT(DISTINCT orders.id) > 5.
Running total of revenue per user (window function) – calculate a cumulative sum ordered by order_date.

Capture a screenshot of each result and write a one‑sentence interpretation (e.g., “Electronics holds the highest inventory value: $125 k”).

🎯 Exercise 5.3 — Performance Test with ROLLUP

Run the ROLLUP query from Section 5.1 on a dataset of at least 10 000 orders. Record the execution time with EXPLAIN ANALYZE **before** adding any indexes, then create indexes on orders.order_date and users.country and repeat. Write a ≤ 100‑word summary of the performance impact.

10️⃣ Cheat‑Sheet – Aggregation & `GROUP BY` Quick Reference

Clause / Function	Purpose / Example
`COUNT(*)`	Rows in the result set.
`COUNT(col)`	Non‑NULL values in `col`.
`SUM(col)`	Additive total.
`AVG(col)`	Mean value.
`MIN(col) / MAX(col)`	Smallest / largest value.
`GROUP BY col1, col2 …`	Define grouping columns.
`HAVING condition`	Filter groups (post‑aggregation).
`ORDER BY col ASC\|DESC`	Sort final rows.
`ROLLUP(col1, col2)`	Hierarchical subtotals + grand total.
`CUBE(col1, col2)`	All combination totals.
`GROUPING SETS((…), (…))`	Custom mix of groupings.
`SUM(col) OVER (PARTITION BY … ORDER BY …)`	Running total / rank without collapsing rows.
`STRING_AGG(col, ', ')` (Postgres) / `GROUP_CONCAT(col SEPARATOR ', ')` (MySQL)	Concatenate strings per group.

11️⃣ Knowledge‑Check Quiz (7 questions)

Which query returns the number of distinct product categories?
SELECT COUNT(*) FROM products;
SELECT COUNT(DISTINCT category) FROM products;
SELECT COUNT(category) FROM products;
What does this query do?
```
SELECT country, COUNT(*) AS users
FROM users
GROUP BY country
HAVING COUNT(*) > 3;
```
A. Returns countries with **exactly** 3 users.
B. Returns countries with **more than** 3 users.
C. Returns every country regardless of user count.
True or False – GROUP BY and DISTINCT are interchangeable.
Which construct will give you a grand total, per‑country subtotal, per‑year subtotal, and per‑country‑year cells in a single result set?
GROUP BY ROLLUP(country, YEAR(order_date))
GROUP BY CUBE(country, YEAR(order_date))
GROUP BY GROUPING SETS((country, YEAR), (country), (YEAR))
In the window‑function example below, what is the effect of PARTITION BY u.id?
```
SELECT o.id,
       SUM(p.price * oi.qty) OVER (PARTITION BY u.id ORDER BY o.order_date) AS cum_spend
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id;
```
A. Groups rows by user before calculating the cumulative sum.
B. Filters rows to only those where u.id matches the current row.
C. Orders the final result set by u.id.
Which index will most improve this query’s performance?
```
SELECT country, COUNT(*) AS cnt
FROM users
WHERE country = 'USA'
GROUP BY country;
```
Index on users.id.
Composite index on (country, id).
Index on users.country.
What does the GROUPING() function return inside a ROLLUP query?

Show Answers

B – COUNT(DISTINCT category) gives the number of unique categories.
B – HAVING COUNT(*) > 3 filters for groups with more than three users.
False – GROUP BY groups rows, while DISTINCT removes duplicate rows; they are not interchangeable.
A – ROLLUP(country, YEAR(...)) yields the grand total, country subtotals, year subtotals, and country‑year cells.
A – PARTITION BY u.id creates a separate window (bucket) per user, then computes the cumulative sum inside that bucket ordered by date.
C – An index on the filtered column (users.country) allows the DB engine to satisfy both the WHERE and the GROUP BY efficiently.
The GROUPING() function returns 1 for a column that is being aggregated at a higher level (i.e., when the column’s value is replaced by NULL in a subtotal or grand‑total row) and 0 otherwise. It is used to differentiate real NULL values from the artificial NULLs generated by ROLLUP/CUBE.

12️⃣ Further Reading & Templates

Books
- SQL Performance Explained – Markus Winand (excellent chapter on aggregation).
- SQL Antipatterns – Bill Karwin (covers common aggregation mistakes).
- Learning SQL – Alan Beaulieu (good for beginners).
Official Docs
Cheat‑Sheet (PDF) – Aggregation Cheat‑Sheet (PDF)
Interactive Playground
- DB‑Fiddle – run PostgreSQL/MySQL/SQLite snippets online.
- Mode Analytics SQL Tutorial (real‑world data sets)

Lecture 06 · Core Concepts

Subqueries & CTEs

Introduction to Subqueries

A subquery is a query written inside another SQL query. Subqueries allow developers to break complex problems into smaller logical steps.

Subqueries are commonly used for:

Filtering data
Comparing values
Generating temporary result sets
Performing calculations
Building advanced reports

Basic Subquery Example

A subquery can be placed inside a WHERE clause.

SELECT name
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
);

The inner query calculates the average salary, while the outer query returns employees earning above that average.

How Subqueries Work

SQL executes the inner query first, then uses the returned result in the outer query.

This allows queries to become dynamic and data-driven.

Subqueries in the SELECT Clause

Subqueries can also appear inside the SELECT statement.

SELECT
    name,
    (
        SELECT department_name
        FROM departments
        WHERE departments.id = employees.department_id
    ) AS department
FROM employees;

This query retrieves employee names along with department names.

Subqueries in the FROM Clause

A subquery inside the FROM clause acts like a temporary table.

SELECT avg_salary.department_id,
       avg_salary.average
FROM (
    SELECT department_id,
           AVG(salary) AS average
    FROM employees
    GROUP BY department_id
) AS avg_salary;

This approach improves readability for complex calculations.

Single-Row vs Multi-Row Subqueries

Some subqueries return one value, while others return multiple rows.

Single-row subqueries are often used with operators such as:

=
>
<

Multi-row subqueries are commonly used with:

IN
ANY
ALL

Using IN with Subqueries

SELECT name
FROM employees
WHERE department_id IN (
    SELECT id
    FROM departments
    WHERE location = 'London'
);

This query finds employees working in departments located in London.

Correlated Subqueries

A correlated subquery depends on values from the outer query.

SELECT name,
       salary
FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);

The inner query runs once for every row processed by the outer query.

Introduction to CTEs

A Common Table Expression (CTE) is a temporary named result set created using the WITH keyword.

CTEs improve readability and make complex queries easier to manage.

Common Table Expressions (CTE)

WITH high_value_orders AS (

    SELECT *
    FROM orders
    WHERE total > 1000
)

SELECT *
FROM high_value_orders;

The CTE creates a temporary result called high_value_orders.

Why Use CTEs?

CTEs provide several advantages:

Improve query readability.
Reduce repeated logic.
Make debugging easier.
Organize complex SQL statements.
Support recursive queries.

Multiple CTEs

You can define multiple CTEs in a single query.

WITH sales_total AS (

    SELECT customer_id,
           SUM(total) AS total_sales
    FROM orders
    GROUP BY customer_id
),

top_customers AS (

    SELECT *
    FROM sales_total
    WHERE total_sales > 5000
)

SELECT *
FROM top_customers;

This structure makes long queries much easier to understand.

Recursive CTEs

Recursive CTEs are used for hierarchical or tree-structured data.

WITH RECURSIVE numbers AS (

    SELECT 1 AS n

    UNION ALL

    SELECT n + 1
    FROM numbers
    WHERE n < 5
)

SELECT *
FROM numbers;

This query generates numbers from 1 to 5 recursively.

CTEs vs Subqueries

Both CTEs and subqueries solve similar problems, but they differ in readability and structure.

Subqueries are compact for small operations.
CTEs are better for complex multi-step logic.
CTEs improve maintainability.
Recursive operations require CTEs.

Performance Considerations

In some databases, CTEs may improve readability but not necessarily performance.

Database engines optimize queries differently, so performance testing is important for large datasets.

Real-World Use Cases

Subqueries and CTEs are heavily used in:

Business reporting
Analytics dashboards
Financial systems
Inventory management
Data warehousing
Hierarchical organizational data

Common Beginner Mistakes

Using subqueries when joins are simpler.
Writing deeply nested unreadable queries.
Forgetting aliases for subqueries.
Creating inefficient correlated subqueries.
Confusing recursive and non-recursive CTEs.

Best Practices

Use meaningful CTE names.
Keep queries readable and modular.
Use CTEs for multi-step transformations.
Test performance on large datasets.
Avoid unnecessary nested queries.

Practice

Create a subquery that returns employees earning more than the average salary.

Then create a CTE that selects products with stock greater than 100 and display the results.

Finally, write a recursive CTE that generates numbers from 1 to 10.

Summary

Subqueries and Common Table Expressions are powerful SQL tools used to build advanced and organized queries. Subqueries help solve problems step-by-step, while CTEs improve readability and support recursive operations.

Mastering these concepts is essential for writing professional SQL queries used in real-world applications and enterprise databases.

Lecture 07 · Core Concepts

Data Modification

Introduction to Data Modification

Databases are not only used for retrieving data — they are also used to create, modify, and remove information.

SQL provides several commands for changing database records safely and efficiently.

INSERT — adds new data
UPDATE — modifies existing data
DELETE — removes data

INSERT, UPDATE, DELETE

INSERT INTO users (name, email)
VALUES ('Bob', 'bob@example.com');

UPDATE products
SET price = price * 1.1
WHERE category = 'Tech';

DELETE FROM logs
WHERE created_at < '2023-01-01';

Understanding INSERT

The INSERT INTO statement adds new rows into a table.

This command is commonly used in:

User registration systems
Order management systems
Inventory applications
Blog and CMS platforms

Basic INSERT Syntax

INSERT INTO students (name, age)
VALUES ('Alice', 21);

Here, a new row is inserted into the students table.

Inserting Multiple Rows

SQL allows inserting multiple records in a single query.

INSERT INTO products (name, price)
VALUES
('Mouse', 500),
('Keyboard', 1200),
('Monitor', 9000);

INSERT Without Specifying Columns

If values are provided in exact table order, column names can be omitted.

INSERT INTO users
VALUES (1, 'Ahmed', 'ahmed@example.com');

However, specifying column names is considered a better practice.

Understanding UPDATE

The UPDATE statement modifies existing records.

UPDATE employees
SET salary = 50000
WHERE id = 1;

This query updates the salary of the employee with ID 1.

Updating Multiple Columns

UPDATE users
SET
    city = 'Delhi',
    status = 'Active'
WHERE id = 5;

Importance of WHERE Clause

The WHERE clause determines which rows are affected.

UPDATE users
SET status = 'Inactive';

Without a WHERE clause, all rows in the table will be updated.

Understanding DELETE

The DELETE statement removes records from a table.

DELETE FROM customers
WHERE id = 10;

This removes the customer whose ID equals 10.

Deleting Multiple Records

DELETE FROM orders
WHERE status = 'Cancelled';

Deleting All Rows

Removing the WHERE clause deletes all records.

DELETE FROM logs;

Use this carefully because deleted rows may not be recoverable.

TRUNCATE vs DELETE

DELETE	TRUNCATE
Removes selected rows	Removes all rows
Supports WHERE clause	No WHERE clause
Slower for large tables	Faster operation

Using Conditions in Queries

Conditions help target specific rows during modification.

UPDATE products
SET stock = stock - 1
WHERE product_id = 100;

Working with NULL Values

UPDATE users
SET phone = NULL
WHERE id = 3;

Using Arithmetic Operations

SQL can perform calculations during updates.

UPDATE employees
SET salary = salary + 5000
WHERE department = 'IT';

Transactions and Safety

Transactions help ensure database consistency.

BEGIN;

UPDATE accounts
SET balance = balance - 1000
WHERE id = 1;

UPDATE accounts
SET balance = balance + 1000
WHERE id = 2;

COMMIT;

Transactions are especially useful in banking and payment systems.

Rollback Example

ROLLBACK;

Rollback cancels changes before they are permanently saved.

Best Practices

Always use WHERE with UPDATE and DELETE
Back up important data before modifications
Test queries on small datasets first
Use transactions for critical operations
Validate user input before database changes

Common Mistakes

Forgetting the WHERE clause
Inserting duplicate data accidentally
Deleting records without backup
Using incorrect column order
Ignoring transaction handling

Real-World Examples

User account registration
Updating product inventory
Deleting expired logs
Processing customer orders
Managing employee databases

Mini Practice

Try writing queries for the following:

Insert a new student record
Update a product price
Delete inactive users
Increase salaries by 10%
Use transactions with COMMIT and ROLLBACK

Lecture 08 · Core Concepts

Constraints & Indexes

Introduction to Constraints

Constraints are rules applied to database tables to ensure data accuracy, consistency, and integrity.

They help prevent invalid or duplicate data from being inserted into a database.

Constraints are essential in professional database systems because they enforce business rules directly at the database level.

Why Constraints Matter

Without constraints, databases may contain:

Duplicate records
Missing required values
Invalid relationships
Incorrect data formats
Orphaned records

Constraints help maintain reliable and trustworthy data.

Ensuring Data Integrity

CREATE TABLE authors (

    id SERIAL PRIMARY KEY,

    name TEXT NOT NULL,

    email TEXT UNIQUE
);

This table demonstrates multiple important constraints:

PRIMARY KEY ensures unique row identification.
NOT NULL prevents empty values.
UNIQUE prevents duplicate emails.

Primary Key Constraint

A primary key uniquely identifies each row in a table.

CREATE TABLE students (

    student_id INT PRIMARY KEY,

    name TEXT
);

Primary keys cannot contain duplicate or NULL values.

Auto-Incrementing Keys

Databases often generate primary key values automatically.

id SERIAL PRIMARY KEY

The SERIAL keyword automatically increases values for each new row.

NOT NULL Constraint

The NOT NULL constraint requires a column to always contain a value.

CREATE TABLE products (

    name TEXT NOT NULL,

    price DECIMAL
);

This prevents incomplete records from being inserted.

UNIQUE Constraint

The UNIQUE constraint ensures all values in a column are different.

CREATE TABLE users (

    username TEXT UNIQUE
);

This is commonly used for usernames, emails, and account IDs.

DEFAULT Constraint

The DEFAULT constraint assigns automatic values when no value is provided.

CREATE TABLE orders (

    status TEXT DEFAULT 'Pending'
);

If no status is specified, the database automatically inserts Pending.

CHECK Constraint

The CHECK constraint validates data using conditions.

CREATE TABLE employees (

    age INT CHECK (age >= 18)
);

This prevents invalid age values from being inserted.

Foreign Key Constraint

Foreign keys create relationships between tables.

CREATE TABLE books (

    id SERIAL PRIMARY KEY,

    author_id INT REFERENCES authors(id)
);

The author_id must match an existing value in the authors table.

Referential Integrity

Foreign keys help maintain referential integrity by ensuring related records remain valid.

This prevents orphaned records and broken relationships.

Composite Keys

A composite key combines multiple columns into a single primary key.

CREATE TABLE enrollments (

    student_id INT,

    course_id INT,

    PRIMARY KEY (student_id, course_id)
);

This ensures each student-course combination is unique.

Introduction to Indexes

Indexes improve database query performance by allowing faster data retrieval.

Without indexes, databases may need to scan entire tables to locate records.

How Indexes Work

Indexes function similarly to a book index. Instead of reading every page, the database quickly locates the required information.

Indexes significantly improve search speed for large datasets.

Creating an Index

CREATE INDEX idx_author_name
ON authors(name);

This creates an index on the name column.

Indexes on Multiple Columns

Indexes can include multiple columns.

CREATE INDEX idx_employee
ON employees(last_name, first_name);

Multi-column indexes improve queries involving multiple search conditions.

Unique Indexes

A unique index prevents duplicate values while also improving search performance.

CREATE UNIQUE INDEX idx_email
ON users(email);

When to Use Indexes

Indexes are useful for columns frequently used in:

WHERE clauses
JOIN operations
ORDER BY clauses
GROUP BY statements

When Too Many Indexes Become a Problem

Although indexes improve reading speed, they can slow down insert, update, and delete operations.

Each index must also be updated whenever table data changes.

Clustered vs Non-Clustered Indexes

Some database systems support different index types:

Clustered Index: Organizes physical table data.
Non-Clustered Index: Stores separate index structures.

The exact behavior depends on the database system being used.

Viewing Existing Indexes

Most databases provide commands to inspect indexes.

SELECT *
FROM pg_indexes
WHERE tablename = 'authors';

Dropping an Index

Indexes can be removed if no longer needed.

DROP INDEX idx_author_name;

Real-World Uses

Constraints and indexes are critical in:

Banking systems
E-commerce applications
Inventory systems
Authentication systems
Large enterprise databases
Analytics platforms

Benefits of Constraints & Indexes

Improve data integrity.
Prevent invalid records.
Enforce business rules.
Increase query performance.
Support scalable applications.

Common Beginner Mistakes

Creating too many unnecessary indexes.
Forgetting foreign key relationships.
Using NULL values incorrectly.
Ignoring unique constraints on important fields.
Adding indexes to very small tables unnecessarily.

Best Practices

Always define primary keys.
Use foreign keys for related tables.
Index frequently searched columns.
Validate important business rules using constraints.
Regularly monitor query performance.

Practice

Create a table called customers with:

A primary key column
A unique email column
A NOT NULL name column
A CHECK constraint for age greater than 18

Then create an index on the email column.

Summary

Constraints and indexes are essential database features used to maintain reliable data and improve performance. Constraints enforce data integrity rules, while indexes optimize query execution speed.

Understanding these concepts is critical for designing scalable, secure, and efficient database systems.

Lecture 09 · Advanced

Transactions & ACID

All or Nothing

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Lecture 10 · Advanced

Stored Procedures

Functions and procedures allow you to save SQL logic on the database server.

Lecture 11 · Advanced

Window Functions

Advanced Analytics

SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) as ranking
FROM employees;

Lecture 12 · Capstone

Capstone Project: E-commerce DB

Design and implement a database schema for an e-commerce platform, including products, users, orders, and reviews.

-- Project goals:
-- 1. Create tables with proper constraints
-- 2. Populate with sample data
-- 3. Write complex queries for sales reports
-- 4. Optimize queries with indexes

Lecture 13 · Advanced

Database Design & Normalization

Requires: Lecture 12

Content coming soon...

Lecture 14 · Advanced

Performance & Query Optimization

Requires: Lecture 13

Content coming soon...

Lecture 15 · Advanced

Security & Access Control

Requires: Lecture 14

Content coming soon...

Lecture 16 · Professional

Final Project — Enterprise Database

Requires: All Previous

Content coming soon...

Introduction to SQL & Setup

0️⃣ What is SQL?

Brief History & Why It Still Matters

Market Demand (2024)

1️⃣ Setup – Get a DB Up & Running

Option A – Native Install (PostgreSQL)

Option B – Lightweight SQLite (Zero‑Install)

Option C – Docker (All Platforms)

SQL Client Recommendations

2️⃣ First‑Look SQL – “Hello SQL” & Basic Statements

Core DDL (Data Definition Language)

Core DML (Data Manipulation Language)

💻 Try It Yourself - Multi-Language Compiler

💡 Practice Tips:

Quick Reference Cheat‑Sheet

3️⃣ Sample Schema – Employees & Departments

Insert Sample Data

Visualise the Relationship (placeholder)

Example Queries

4️⃣ Indexes & Query Performance Basics

Why Indexes Matter

Creating an Index

Explain Plan – Seeing What the Optimiser Does

5️⃣ Common Pitfalls – SQL Injection & Safe Coding

Vulnerable Example (PHP)

Safe Alternative – Prepared Statements

6️⃣ Lab Exercises (Extended)

7️⃣ Knowledge‑Check Quiz (5 questions)

8️⃣ Further Reading & Resources

Basic SELECT Queries

1️⃣ Retrieving Data – The Core of SQL

2️⃣ Aliases – Giving Columns & Tables Friendly Names

3️⃣ Filtering Rows – WHERE Clause

4️⃣ Ordering & Limiting – ORDER BY, LIMIT/OFFSET

5️⃣ Removing Duplicates – DISTINCT

6️⃣ Aggregates & Grouping – Summaries & Statistics

7️⃣ Joins – Bringing Data Together

8️⃣ Set Operators – Combining Result Sets

9️⃣ Sub‑queries – Nesting SELECTs

Inline (derived table)

Correlated sub‑query (depends on outer row)

10️⃣ Window Functions – Row‑Level Analytics

11️⃣ Performance Tips – EXPLAIN and Indexes

12️⃣ Safety – Avoiding SQL Injection in SELECTs

13️⃣ Cheat‑Sheet – SELECT Syntax Overview

14️⃣ Sample Schema – Customers, Orders, Products

15️⃣ Lab Exercises (Extended)

16️⃣ Knowledge‑Check Quiz (7 questions)

17️⃣ Further Reading & Resources

Filtering & Sorting

1️⃣ The WHERE Clause – Selecting the Right Rows

1.1 Comparison & Logical Operators

1.2 Practical WHERE Samples

2️⃣ Ordering Results – ORDER BY

2.1 Basic Syntax

2.2 Controlling NULL Placement

2.3 Case‑Insensitive / Locale‑Aware Sorting

2.4 Ordering by an Expression (Custom Sort)

2.5 Random Ordering (caution!)

3️⃣ Pagination – LIMIT & OFFSET

Keyset Pagination (Seek Method)

4️⃣ Performance – How Filtering & Sorting Interact with Indexes

4.1 Index Usage Basics

4.2 EXPLAIN – Visualising the Plan

4.3 Common Pitfalls

5️⃣ Security – Safe Filtering & Sorting

6️⃣ Cheat‑Sheet – Filtering & Sorting Summary

7️⃣ Sample Schema – Customers, Orders, Products

8️⃣ Lab Exercises (Hands‑On)

9️⃣ Knowledge‑Check Quiz (7 questions)

10️⃣ Further Reading & Resources

Joins – Combining Data From Multiple Tables

0️⃣ Why Joins Matter

1️⃣ Sample Schema (Orders, Users, Products, Order_Items)

2️⃣ Join Types – Theory (with Venn diagrams)

3️⃣ Practical Join Syntax – Variations & Shortcuts

3.1 Using USING (when column names match)

3.2 Table Aliases – make queries readable

3.3 Self‑Join – hierarchical data (e.g., employee‑manager)

3.4 Joining More Than Two Tables

Option A – Native Install (PostgreSQL)

Option B – Lightweight SQLite (Zero‑Install)

Option C – Docker (All Platforms)

2️⃣ First‑Look SQL – “Hello SQL” & Basic Statements

7️⃣ Knowledge‑Check Quiz (5 questions)

3️⃣ Filtering Rows – `WHERE` Clause

4️⃣ Ordering & Limiting – `ORDER BY`, `LIMIT`/`OFFSET`

5️⃣ Removing Duplicates – `DISTINCT`

11️⃣ Performance Tips – `EXPLAIN` and Indexes

16️⃣ Knowledge‑Check Quiz (7 questions)

1️⃣ The `WHERE` Clause – Selecting the Right Rows

1.2 Practical `WHERE` Samples

2️⃣ Ordering Results – `ORDER BY`

3️⃣ Pagination – `LIMIT` & `OFFSET`

9️⃣ Knowledge‑Check Quiz (7 questions)

3.1 Using `USING` (when column names match)

7️⃣ Knowledge‑Check Quiz (7 questions)

8️⃣ Further Reading & Templates

Aggregation & `GROUP BY`

1️⃣ Sample Schema (Orders ↔ Users ↔ Products)

3️⃣ Basic `GROUP BY` Syntax

4️⃣ Filtering Groups – `HAVING`

5️⃣ Advanced Grouping – `ROLLUP`, `CUBE`, `GROUPING SETS`

5.1 `ROLLUP` – Hierarchical Sub‑Totals

5.2 `CUBE` – All Possible Combinations

5.3 `GROUPING SETS` – Hand‑Picked Mix

10️⃣ Cheat‑Sheet – Aggregation & `GROUP BY` Quick Reference

11️⃣ Knowledge‑Check Quiz (7 questions)