Lecture 1 / 12
Lecture 01 · Fundamentals

Introduction to SQL & Setup

Beginner ~45 min

0️⃣ What is SQL?

SQL (Structured Query Language) is the ANSI‑standard language used to create, read, update, and delete data in relational database management systems (RDBMS). It abstracts the underlying storage engine and lets you work with tables (relations) that are linked via keys.

Brief History & Why It Still Matters

  • 1970 – Edgar F. Codd proposes the relational model (IBM System R).
  • 1974 – IBM releases SQL/2, the first commercial implementation.
  • 1992 – ANSI/ISO adopts SQL‑92 as a formal standard (still the baseline).
  • 2000‑2024 – Continual extensions (SQL:1999, SQL:2003, …) add XML/JSON handling, window functions, and more.
RDBMSLicenseTypical Use‑CaseLatest Stable Version (2024)
PostgreSQLOpen‑source (PostgreSQL License)Enterprise‑grade, geo‑spatial, analytics16.2
MySQL / MariaDBDual (GPL / Commercial)Web‑apps, LAMP stack8.4 / 10.11
Microsoft SQL ServerCommercial (Free Express)Windows‑centric, BI2022 CU15
Oracle DatabaseCommercial (Free XE)Large‑scale transaction processing23c
SQLitePublic‑domain (no server)Embedded, mobile, prototyping3.45

Market Demand (2024)

SkillAvg Salary (US $)Stack Overflow Survey % of Respondents
SQL / Relational DB≈ $115 k78 %
PostgreSQL≈ $120 k43 %
MySQL≈ $112 k55 %
SQLite (mobile/IoT)≈ $108 k31 %

Source: Stack Overflow Developer Survey 2024 + Glassdoor.

⚠️️ Security Reminder
Even a simple SELECT can become dangerous when user input is concatenated directly into the query. Always use **parameterised queries** (prepared statements) to avoid SQL injection.

1️⃣ Setup – Get a DB Up & Running

Option A – Native Install (PostgreSQL)

  • WindowsDownload the installer, run, accept defaults (creates a postgres super‑user).
  • macOSbrew install postgresql then brew services start postgresql.
  • Linux (Debian/Ubuntu)sudo apt update && sudo apt install -y postgresql postgresql-contrib.
    Verify: sudo -u postgres psql -c "SELECT version();"

Option B – Lightweight SQLite (Zero‑Install)

SQLite ships with Python, PHP, and many OSes. A single file is all you need.

bash
# Verify SQLite is present
sqlite3 --version    # e.g. 3.45.0

# Create a new DB file called demo.db and open the CLI
sqlite3 demo.db

# Inside the CLI you’ll see the prompt: sqlite>

Option C – Docker (All Platforms)

Docker gives you an isolated RDBMS with a single command. No admin rights required.

docker
# Pull the official PostgreSQL image
docker pull postgres:16-alpine

# Run it (exposes 5432 to host)
docker run -d \
    --name pg_sql_demo \
    -e POSTGRES_PASSWORD=SuperSecret123 \
    -e POSTGRES_USER=demo_user \
    -e POSTGRES_DB=demo_db \
    -p 5432:5432 \
    postgres:16-alpine

# Connect from host
psql -h localhost -U demo_user -d demo_db -W   # password: SuperSecret123

SQL Client Recommendations

  • DBeaver – universal cross‑platform GUI (JDBC/ODBC).
  • pgAdmin – native PostgreSQL web‑based console.
  • Azure Data Studio – Microsoft’s cross‑platform query editor (works with PostgreSQL, MySQL, SQL Server).
  • Command‑line: psql (PostgreSQL), mysql, sqlite3.

2️⃣ First‑Look SQL – “Hello SQL” & Basic Statements

SQL (pSQL prompt) – Hello
SELECT 'Hello, SQL Mastery!' AS greeting;
Output
greeting
Hello, SQL Mastery!

Core DDL (Data Definition Language)

  • CREATE TABLE – define a table schema.
  • ALTER TABLE – add/modify columns, constraints.
  • DROP TABLE – delete a table.

Core DML (Data Manipulation Language)

  • SELECT … FROM … – read data (the “R” in CRUD).
  • INSERT INTO … VALUES … – add rows.
  • UPDATE … SET … WHERE … – modify rows.
  • DELETE FROM … WHERE … – remove rows.

💻 Try It Yourself - Multi-Language Compiler

Practice SQL and many other programming languages right here in your browser! Switch between languages, modify the code, and click "Run" to see results instantly.

💡 Practice Tips:

  • Switch to SQL in the language selector and try the database query examples
  • Experiment with SQL's SELECT, INSERT, UPDATE, and DELETE statements
  • Try other data languages like Python, R, or compare with database concepts
  • Use the "Load Example" button to see SQL-specific code samples
  • Use Ctrl+Enter to quickly run your code

Quick Reference Cheat‑Sheet

StatementPurposeBasic Syntax Example
CREATE TABLE Define a new relation CREATE TABLE employees (id SERIAL PRIMARY KEY, name TEXT NOT NULL, dept_id INT);
INSERT Add a row INSERT INTO employees (name,dept_id) VALUES ('Alice',1);
SELECT Read data SELECT name,dept_id FROM employees WHERE dept_id = 1;
UPDATE Modify existing rows UPDATE employees SET dept_id = 2 WHERE name = 'Alice';
DELETE Remove rows DELETE FROM employees WHERE name = 'Alice';
JOIN Combine rows from multiple tables SELECT e.name, d.name FROM employees e JOIN departments d ON e.dept_id = d.id;
GROUP BY / HAVING Aggregate data SELECT dept_id, COUNT(*) FROM employees GROUP BY dept_id HAVING COUNT(*) > 5;
ORDER BY Sort results SELECT * FROM employees ORDER BY name ASC;
LIMIT / OFFSET Paginate results SELECT * FROM employees LIMIT 10 OFFSET 20;

3️⃣ Sample Schema – Employees & Departments

Let’s build a tiny HR database that you can reuse for many hands‑on tasks.

SQL – Schema Creation (PostgreSQL)
CREATE TABLE departments (
    id   SERIAL      PRIMARY KEY,
    name VARCHAR(50) NOT NULL UNIQUE
);

CREATE TABLE employees (
    emp_id      SERIAL      PRIMARY KEY,
    first_name  VARCHAR(30) NOT NULL,
    last_name   VARCHAR(30) NOT NULL,
    email       VARCHAR(100) UNIQUE,
    hire_date   DATE        DEFAULT CURRENT_DATE,
    salary      NUMERIC(9,2) CHECK (salary > 0),
    dept_id     INT         REFERENCES departments(id) ON DELETE SET NULL
);

Insert Sample Data

SQL – Sample Rows
INSERT INTO departments (name) VALUES
    ('Engineering'), ('Marketing'), ('HR'), ('Finance');

INSERT INTO employees (first_name, last_name, email, salary, dept_id) VALUES
    ('Alice',  'Smith',   'alice.smith@example.com',  87000, 1),
    ('Bob',    'Brown',   'bob.brown@example.com',    62000, 2),
    ('Carol',  'Jones',   'carol.jones@example.com',  54000, 3),
    ('David',  'Lee',     'david.lee@example.com',   103000, 1),
    ('Eve',    'Taylor',  'eve.taylor@example.com',   77000, 4);

Visualise the Relationship (placeholder)

ER‑diagram (departments 1‑* employees) goes here

Example Queries

SQL – 5 Common Queries
-- 1️⃣ List all employees with department name
SELECT e.emp_id, e.first_name, e.last_name, d.name AS department
FROM   employees e
JOIN   departments d ON e.dept_id = d.id
ORDER BY e.emp_id;

-- 2️⃣ Average salary per department
SELECT d.name, ROUND(AVG(e.salary),2) AS avg_salary
FROM   employees e
JOIN   departments d ON e.dept_id = d.id
GROUP BY d.name
HAVING COUNT(*) > 0;

-- 3️⃣ Employees hired in the last 30 days
SELECT * FROM employees
WHERE hire_date > CURRENT_DATE - INTERVAL '30 days';

-- 4️⃣ Find employees with duplicate email domains (e.g., possible data‑quality issue)
SELECT SUBSTRING(email FROM POSITION('@' IN email)+1) AS domain, COUNT(*) AS cnt
FROM   employees
GROUP BY domain
HAVING COUNT(*) > 1;

-- 5️⃣ Delete a department (will set dept_id of its employees to NULL)
DELETE FROM departments WHERE name = 'Marketing';

4️⃣ Indexes & Query Performance Basics

Why Indexes Matter

  • They are data structures (usually B‑trees) that let the engine locate rows without scanning the whole table.
  • Typical usage: columns in WHERE, JOIN, ORDER BY, GROUP BY.
  • Trade‑off: faster reads, slower writes (INSERT/UPDATE) and extra storage.

Creating an Index

SQL – Create Index
CREATE INDEX idx_emp_email ON employees(email);
CREATE INDEX idx_emp_dept ON employees(dept_id);

Explain Plan – Seeing What the Optimiser Does

SQL – EXPLAIN (PostgreSQL)
EXPLAIN ANALYZE
SELECT e.first_name, e.last_name, d.name
FROM   employees e
JOIN   departments d ON e.dept_id = d.id
WHERE  e.salary > 80000;

The output shows whether the query uses an index scan, sequential scan, or a hash join, and reports actual execution time in ms.

5️⃣ Common Pitfalls – SQL Injection & Safe Coding

SQL Injection occurs when untrusted input is concatenated into a query string without sanitisation.

Vulnerable Example (PHP)

PHP – Unsafe Query
$id   = $_GET['id'];                 // e.g., ?id=1 OR 1=1
$sql  = "SELECT * FROM employees WHERE emp_id = $id";
$res  = $pdo->query($sql);

Safe Alternative – Prepared Statements

PHP – PDO Prepared
$stmt = $pdo->prepare('SELECT * FROM employees WHERE emp_id = :id');
$stmt->execute(['id' => $_GET['id']]);   // type‑safe, DB driver does the escaping
$res  = $stmt->fetchAll();

All languages have similar mechanisms (Python’s cursor.execute(..., params), Java’s PreparedStatement, .NET’s SqlCommand with parameters, etc.).

🚨 Never
  • Use string interpolation to build SQL.
  • Trust client‑side validation alone.
  • Store raw passwords in plain text – always hash (bcrypt, Argon2).

6️⃣ Lab Exercises (Extended)

🎯 Exercise 1.1 – Install & Verify a Database

Choose one of the three installation methods (native PostgreSQL, SQLite, Docker). Then:

  1. Start the DB server (or open the SQLite CLI).
  2. Run the “Hello, SQL Mastery!” query and capture a screenshot of the output.
  3. Record the exact command you typed (including any environment variables).
🎯 Exercise 1.2 – Build the Employees Sample Schema

Using the SQL script provided above:

  1. Create the departments and employees tables.
  2. Insert the sample rows.
  3. Run the “Average salary per department” query and note any department with an average > $80 k.
  4. Write a 150‑word “lab journal” describing any errors you encountered and how you fixed them.
🎯 Exercise 1.3 – Index & Explain

For the employees.email column:

  1. Create an index as shown earlier.
  2. Run the same SELECT … WHERE email = 'alice.smith@example.com' query with the index and capture the EXPLAIN ANALYZE output.
  3. Delete the index (DROP INDEX idx_emp_email;) and re‑run the query – compare timings.
  4. Summarise (≤ 100 words) why the index helped (or didn’t) in this case.
🎯 Exercise 1.4 – Safe Coding Mini‑Project (any language)

Write a short program (≤ 50 lines) that:

  • Accepts a user‑supplied employee ID from the console.
  • Queries the employees table for that ID using a prepared statement.
  • Prints the employee’s full name and department.

Include the source code and a screenshot of a successful run.

7️⃣ Knowledge‑Check Quiz (5 questions)

  1. Which SQL clause is used to combine rows from two tables?
    GROUP BY
    JOIN
    UNION
  2. What does the SERIAL data type do in PostgreSQL?
    Stores binary data.
    Auto‑increments an integer and creates a sequence.
    Stores a fixed-length string of 20 characters.
  3. True or False – DROP TABLE can be rolled back inside a transaction in PostgreSQL.
  4. Which of the following is a **safe** way to guard against SQL injection?
    Escape single quotes manually.
    Use a prepared statement / parameterised query.
    Validate the input length only.
  5. What is the purpose of an INDEX?

Click to reveal the answers.

Show Answers
  1. BJOIN combines rows from two tables based on a condition.
  2. BSERIAL creates an auto‑increment integer backed by a sequence.
  3. False – Dropping a table is DDL and cannot be rolled back in PostgreSQL (unless using a schema‑level transaction in recent versions, but standard practice treats it as non‑reversible).
  4. B – Parameterised queries are the industry‑standard defense.
  5. An INDEX provides a fast lookup mechanism (typically a B‑tree) that reduces the amount of data scanned for queries that filter, sort, or join on the indexed column(s).

8️⃣ Further Reading & Resources

Lecture 02 · Fundamentals

Basic SELECT Queries

Beginner ~40 min

1️⃣ Retrieving Data – The Core of SQL

The SELECT statement is the primary read‑only operation in a relational database. It defines **what** columns you want, **from which** tables, and optionally how the rows should be filtered, ordered, grouped, or limited.

SELECT name, price FROM products;
SELECT * FROM employees; -- all columns (discouraged in production)
⚠️ Why SELECT * Is Discouraged
  • Pulls every column – can cause unnecessary I/O and network traffic.
  • Masks schema changes (new columns appear silently in result sets).
  • Exposes sensitive columns (password hashes, internal IDs) if you forget to prune them.

2️⃣ Aliases – Giving Columns & Tables Friendly Names

Aliases improve readability, especially in joins or when column names clash.

SELECT first_name AS name FROM users;
SELECT u.first_name AS fname,
       u.last_name  AS lname FROM users AS u;

Table aliases also let you qualify columns (t1.id) which is mandatory when two tables share a column name.

3️⃣ Filtering Rows – WHERE Clause

SELECT name, price
FROM products
WHERE price > 100 AND category = 'Electronics';

Common comparison operators: =, <>, >, <, >=, <= and set operators IN, NOT IN. Use LIKE for pattern matching.

SELECT *
FROM employees
WHERE email LIKE '%@example.com';   -- ends with @example.com

4️⃣ Ordering & Limiting – ORDER BY, LIMIT/OFFSET

SELECT name, price
FROM products
WHERE price > 0
ORDER BY price DESC
LIMIT 10 OFFSET 20;   -- pagination: page 3 (0‑based)

Use ASC (default) for ascending order. OFFSET is optional – without it, LIMIT 10 simply returns the first ten rows.

5️⃣ Removing Duplicates – DISTINCT

SELECT DISTINCT department_id FROM employees;

DISTINCT works on the whole projected row. If you need distinct on a single column but keep other columns, use a GROUP BY (see next section).

6️⃣ Aggregates & Grouping – Summaries & Statistics

FunctionDescriptionExample
COUNT(col)Number of non‑NULL valuesSELECT COUNT(*) FROM orders;
SUM(col)Total of numeric columnSELECT SUM(amount) FROM sales;
AVG(col)Mean valueSELECT AVG(salary) FROM employees;
MIN(col)Smallest valueSELECT MIN(price) FROM products;
MAX(col)Largest valueSELECT MAX(price) FROM products;

To obtain aggregates **per group**, combine them with GROUP BY:

SELECT department_id,
       COUNT(*)                AS employee_cnt,
       AVG(salary)             AS avg_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;   -- keep only large departments

HAVING filters groups after aggregation (equivalent to WHERE for rows).

7️⃣ Joins – Bringing Data Together

Four basic join types (inner, left/right outer, full outer, cross). The most common is INNER JOIN.

SELECT e.first_name,
       e.last_name,
       d.name AS department
FROM employees AS e
JOIN departments AS d ON e.department_id = d.id
WHERE d.name = 'Engineering'
ORDER BY e.last_name;

For left/right outer join:

SELECT c.name AS customer,
       o.id   AS order_id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
ORDER BY c.name;

Use FULL OUTER JOIN when you need rows that appear in *either* table (PostgreSQL supports it; MySQL does not natively).

8️⃣ Set Operators – Combining Result Sets

SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers;   -- eliminates duplicates

SELECT email FROM customers
INTERSECT
SELECT email FROM newsletter_subscribers;   -- only common emails

SELECT email FROM customers
EXCEPT
SELECT email FROM newsletter_subscribers;   -- in customers but not in subscribers

All participating SELECTs must have the **same column count** and **compatible data types**.

9️⃣ Sub‑queries – Nesting SELECTs

Inline (derived table)

SELECT dept_id, avg_salary
FROM (
    SELECT department_id AS dept_id,
           AVG(salary)    AS avg_salary
    FROM employees
    GROUP BY department_id
) AS dept_avg
WHERE avg_salary > 80000;

Correlated sub‑query (depends on outer row)

SELECT e.id, e.first_name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) FROM employees
    WHERE department_id = e.department_id
);

Correlated sub‑queries can be slower; many RDBMS transform them into joins automatically.

10️⃣ Window Functions – Row‑Level Analytics

SELECT employee_id,
       salary,
       AVG(salary) OVER (PARTITION BY department_id) AS dept_avg,
       RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees
ORDER BY department_id, salary DESC;

Window functions keep the original row granularity while adding a calculated column (e.g., running total, moving average, rank).

11️⃣ Performance Tips – EXPLAIN and Indexes

Before writing a complex SELECT, inspect the execution plan:

EXPLAIN ANALYZE
SELECT e.first_name, e.last_name, d.name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Engineering'
ORDER BY e.last_name
LIMIT 50;

The output tells you whether an **index scan**, **sequential scan**, or **hash join** is used. If you see “Seq Scan” on a large table with a WHERE clause on an indexed column, create an index:

CREATE INDEX idx_emp_dept ON employees(department_id);
CREATE INDEX idx_dept_name ON departments(name);

Remember: each index speeds reads but slows INSERT/UPDATE/DELETE and consumes storage.

12️⃣ Safety – Avoiding SQL Injection in SELECTs

Even a read‑only query can be hijacked to return data you never intended. Always use **parameterised queries** / **prepared statements** in application code.

Python (psycopg2) – Safe SELECT
import psycopg2

conn = psycopg2.connect(dsn='dbname=mydb user=app')
cur  = conn.cursor()
cur.execute(
    "SELECT name, price FROM products WHERE price > %s AND category = %s;",
    (100, 'Electronics')
)
rows = cur.fetchall()
for r in rows:
    print(r)

Never concatenate user‑input directly into the SQL string.

13️⃣ Cheat‑Sheet – SELECT Syntax Overview

ClausePurposeTypical Syntax
SELECTColumns / expressions to returnSELECT col1, col2, …
FROMSource table(s) / subqueryFROM table_name [AS t]
WHERERow filterWHERE condition
GROUP BYAggregate groupingGROUP BY col1, col2
HAVINGFilter on aggregated groupsHAVING aggregate_condition
ORDER BYSort result setORDER BY col [ASC|DESC]
LIMIT / OFFSETPaginationLIMIT 10 OFFSET 20
DISTINCTRemove duplicate rowsSELECT DISTINCT col FROM …
JOINCombine tablesJOIN other_table ON condition
UNION / INTERSECT / EXCEPTSet algebraSELECT … UNION SELECT …
WINDOWAnalytic functionsOVER (PARTITION BY … ORDER BY …)

14️⃣ Sample Schema – Customers, Orders, Products

ER‑diagram (Customers ←→ Orders → Products) placeholder

15️⃣ Lab Exercises (Extended)

🎯 Exercise 2.1 – Build the Sample Schema

Using the diagram above, create three tables in PostgreSQL (or SQLite) with appropriate primary/foreign keys. Then insert at least 10 rows per table (use any fictional data you like).

🎯 Exercise 2.2 – Basic Queries
  1. List all products cheaper than $50, ordered by price ascending.
  2. Show the total amount each customer has spent (use SUM and GROUP BY).
  3. Find customers who placed more than 3 orders (use HAVING).
  4. Display the top 5 most‑expensive orders (use ORDER BY and LIMIT).
🎯 Exercise 2.3 – Joins & Sub‑queries
  1. Show each order with the customer’s name and the product’s name (inner join across three tables).
  2. Find products that have never been ordered (left join + WHERE order_id IS NULL).
  3. Write a correlated sub‑query that returns customers whose total spend is above the **average** spend of all customers.
🎯 Exercise 2.4 – Window Functions

For each order, compute the running total per customer (order rows sorted by order_date) using SUM() OVER (PARTITION BY … ORDER BY …). Capture the query and a screenshot of the result.

16️⃣ Knowledge‑Check Quiz (7 questions)

  1. Which clause must appear **first** in a SELECT statement?
    FROM
    SELECT
    WHERE
  2. What does the following query return?
    SELECT DISTINCT department_id FROM employees;

    All rows from employees (including duplicates).
    A list of **unique** department IDs.
    The number of employees per department.
  3. True or False – HAVING can be used without GROUP BY.
    True
    False
  4. Which of the following retrieves the 3rd page (page size = 20) of ordered results?
    LIMIT 20 OFFSET 40
    LIMIT 60 OFFSET 20
    OFFSET 40 LIMIT 20
  5. In a LEFT JOIN, rows from the **left** table are returned even when there is no matching row on the right. What value appears for right‑table columns in those cases?
    0
    NULL
    Empty string
  6. Which window function can be used to assign a **dense rank** (no gaps) to rows ordered by salary descending?
    ROW_NUMBER()
    DENSE_RANK()
    RANK()
  7. What is the primary benefit of using a **prepared statement** for a SELECT that contains user input?
    It makes the query run faster.
    It automatically adds LIMIT to protect the DB.
    It separates data from code, preventing SQL injection.
Show Answers
  1. BSELECT is always the first clause.
  2. B – Returns a list of distinct department IDs.
  3. FalseHAVING filters groups, so a GROUP BY (or an implicit grouping) is required.
  4. ALIMIT 20 OFFSET 40 fetches rows 41‑60 → page 3 (0‑based offset).
  5. B – Missing right‑side rows are filled with NULL.
  6. BDENSE_RANK() gives a consecutive rank without gaps.
  7. C – Parameters are bound separately from the SQL string, eliminating injection vectors.

17️⃣ Further Reading & Resources

Lecture 03 · Fundamentals

Filtering & Sorting

Beginner ~45 min

1️⃣ The WHERE Clause – Selecting the Right Rows

WHERE is the gate‑keeper of a SELECT statement. Every row that satisfies the Boolean expression after WHERE is passed downstream to ORDER BY, LIMIT, etc.

1.1 Comparison & Logical Operators

OperatorDescriptionExample
=Equalityprice = 199.99
<> or !=Inequalitycategory <> 'Books'
> < >= <=Numeric / date comparisonage >= 21
BETWEEN … AND …Inclusive rangeprice BETWEEN 50 AND 150
IN ( … )Set membershipstatus IN ('active','pending')
LIKE / ILIKEPattern matching (wildcards % and _)email LIKE '%@example.com'
IS NULL / IS NOT NULLNull test (cannot use = NULL)last_login IS NULL
AND / OR / NOTBoolean combinatorscountry = 'USA' AND age >= 18
EXISTS (sub‑query)True if sub‑query returns at least one rowWHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)
ANY / ALLCompare to a set of valuessalary > ANY (SELECT salary FROM employees WHERE department = 'HR')

1.2 Practical WHERE Samples

SELECT *
FROM products
WHERE price > 100
  AND stock >= 10
  AND category IN ('Electronics','Appliances')
  AND discontinued IS NULL;

SELECT *
FROM users
WHERE country = 'USA'
  AND age >= 18
  AND email LIKE '%@example.com';

SELECT *
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';

SELECT *
FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d WHERE d.id = e.department_id
    AND d.name = 'Engineering'
);
⚠️ Performance Tip

**Never** wrap an indexed column in a function in the WHERE clause (e.g., WHERE UPPER(name) = 'ALICE') – this defeats the index. Use a functional index or a case‑insensitive collation instead.

2️⃣ Ordering Results – ORDER BY

2.1 Basic Syntax

Sort by one or more columns, each optionally marked ASC (default) or DESC.

SELECT *
FROM products
ORDER BY price DESC;   -- expensive first

SELECT *
FROM employees
ORDER BY department_id ASC, last_name ASC;

2.2 Controlling NULL Placement

Some engines put NULL values first, others last. Use the explicit NULLS FIRST or NULLS LAST clause.

SELECT *
FROM users
ORDER BY last_login DESC NULLS LAST;

2.3 Case‑Insensitive / Locale‑Aware Sorting

PostgreSQL lets you attach a collation directly in the ORDER BY clause:

SELECT name
FROM customers
ORDER BY name COLLATE \"C\";   -- case‑insensitive, language‑aware

2.4 Ordering by an Expression (Custom Sort)

Advanced use‑cases often need a derived value, e.g., sorting by the length of a string:

SELECT product_name, price
FROM products
ORDER BY LENGTH(product_name) ASC, price DESC;

2.5 Random Ordering (caution!)

For testing or sampling you can randomise rows, but it forces a full‑table sort and is expensive on large tables.


SELECT *
FROM quotes
ORDER BY RANDOM()
LIMIT 1;


SELECT *
FROM quotes
ORDER BY RAND()
LIMIT 1;

3️⃣ Pagination – LIMIT & OFFSET

Web applications often need “pages” of data. The canonical pattern is:

SELECT *
FROM products
ORDER BY product_id
LIMIT 20          -- page size
OFFSET 40;        -- skip first two pages (0‑based)

Tip: Combine pagination with a stable ordering column (e.g., primary key) to get deterministic results.

Keyset Pagination (Seek Method)

When a table grows, OFFSET becomes slower. Use “keyset pagination”:

SELECT *
FROM products
WHERE product_id > 120   -- last ID from previous page
ORDER BY product_id
LIMIT 20;

4️⃣ Performance – How Filtering & Sorting Interact with Indexes

4.1 Index Usage Basics

  • A B‑tree index can satisfy both WHERE predicates and ORDER BY if the sort order matches the index definition.
  • Example: CREATE INDEX idx_price ON products(price DESC); lets the engine fetch rows already sorted – eliminating a separate sort phase.

4.2 EXPLAIN – Visualising the Plan

EXPLAIN ANALYZE
SELECT *
FROM products
WHERE price > 100
ORDER BY price DESC
LIMIT 10;

The output shows whether the planner uses an Index Scan (good) or a Seq Scan + Sort (costly).

4.3 Common Pitfalls

  • Applying a function to an indexed column (WHERE LOWER(email) = 'alice@example.com') disables index usage. Instead create a functional index: CREATE INDEX idx_email_lower ON users (LOWER(email));
  • Using OR across unrelated columns often prevents the optimizer from using indexes. Split the query with UNION ALL if possible.
  • Sorting on a calculated column (e.g., ORDER BY price * tax_rate) forces a temporary file sort.

5️⃣ Security – Safe Filtering & Sorting

Even read‑only queries can be abused for data exfiltration if user input is concatenated unchecked.

Node.js (pg) – Parameterised SELECT
const { Client } = require('pg');
const client = new Client(/* connection params */);

async function getProducts(minPrice, maxPrice, sortCol, sortDir) {
    // whitelist column names & sort direction to prevent injection
    const allowedCols = ['price', 'name', 'created_at'];
    const allowedDir = ['ASC', 'DESC'];

    if (!allowedCols.includes(sortCol) || !allowedDir.includes(sortDir)) {
        throw new Error('Invalid sort parameters');
    }

    const sql = `SELECT *
                 FROM products
                 WHERE price BETWEEN $1 AND $2
                 ORDER BY ${sortCol} ${sortDir}
                 LIMIT 20`;
    const res = await client.query(sql, [minPrice, maxPrice]);
    return res.rows;
}`;
    

**Key take‑aways**:

  • Never interpolate raw user input directly into SQL (even for ORDER BY).
  • Whitelist column names / directions, or map them to an enum.
  • Use prepared statements for values ($1, $2 in PostgreSQL, ? in MySQL).

6️⃣ Cheat‑Sheet – Filtering & Sorting Summary

ClausePurposeTypical Syntax
WHERERow filteringWHERE col = 5 AND other < 10
BETWEEN … AND …Inclusive rangeWHERE price BETWEEN 20 AND 30
IN ( … )Set membershipWHERE status IN ('open','closed')
LIKE / ILIKEPattern matchingWHERE name LIKE 'A%'
IS NULL / IS NOT NULLNull testWHERE deleted_at IS NULL
EXISTS (sub‑query)True if sub‑query returns rowsWHERE EXISTS (SELECT 1 …)
ORDER BYSort result setORDER BY col1 DESC, col2 ASC
NULLS FIRST / LASTExplicit null orderingORDER BY created_at DESC NULLS LAST
LIMIT / OFFSETPaginationLIMIT 20 OFFSET 40
CASE … ENDCustom sort orderORDER BY CASE status WHEN 'high' THEN 1 … END

7️⃣ Sample Schema – Customers, Orders, Products

ER‑diagram (Customers ←→ Orders ←→ Products) placeholder

8️⃣ Lab Exercises (Hands‑On)

🎯 Exercise 3.1 – Filtering Challenge

Using the sample schema (create customers, orders, products tables with at least 20 rows each):

  1. Find all orders placed in the last 30 days with a total amount > $200.
  2. List customers whose email ends with @example.org **and** who have never placed an order.
  3. Show products that have never been ordered (hint: LEFT JOIN + WHERE order_id IS NULL).
  4. Retrieve the top 5 most‑expensive products, ordered by price descending.

Capture the SQL statements and a screenshot of each query’s result.

🎯 Exercise 3.2 – Sorting & Pagination
  1. Write a query that sorts customers by last_name ascending, then first_name descending.
  2. Implement keyset pagination for the orders table (use order_id as the cursor).
  3. Compare the execution time of the same query using OFFSET/LIMIT vs. keyset pagination on a table with 10 000 rows.
  4. Explain why the keyset method is faster on large offsets.
🎯 Exercise 3.3 – Index Impact

On the products table:

  1. Create a B‑tree index on price DESC.
  2. Run the query SELECT * FROM products WHERE price > 500 ORDER BY price DESC LIMIT 20; and capture the EXPLAIN ANALYZE output before and after the index.
  3. Summarise (≤ 100 words) how the index changed the plan and execution time.

9️⃣ Knowledge‑Check Quiz (7 questions)

  1. Which operator correctly checks if a column does not contain a NULL value?
    = NULL
    IS NOT NULL
    != NULL
  2. What does the following query return?
    SELECT *
    FROM employees
    WHERE salary BETWEEN 50000 AND 60000;

    Employees whose salary is exactly $50,000 or $60,000.
    Employees earning **inclusive** of $50k – $60k.
    Employees whose salary is **outside** the range.
  3. True or False – ORDER BY column DESC NULLS FIRST puts the highest non‑null values first, then nulls.
    True
    False
  4. Which of the following is **not** a valid way to avoid SQL injection in an ORDER BY clause?
    Whitelist allowed column names in application code.
    Use a prepared statement and bind the column name as a parameter.
    Validate that the sort direction is either “ASC” or “DESC”.
  5. How would you retrieve the 3rd page of results (page size = 25) ordered by created_at descending?
    LIMIT 25 OFFSET 50
    LIMIT 75 OFFSET 0
    OFFSET 75 LIMIT 25
  6. Which expression will correctly sort rows so that NULL values appear **last** regardless of ASC/DESC?
    ORDER BY col ASC NULLS FIRST
    ORDER BY col DESC NULLS LAST
    ORDER BY ISNULL(col) ASC, col ASC
  7. What is the main performance advantage of a “keyset pagination” query over an OFFSET/LIMIT query on a large table?
    It avoids scanning the entire table up to the offset.
    It uses a hash index instead of a B‑tree.
    It allows the DB to cache the whole result set.
Show Answers
  1. BIS NOT NULL is the correct null‑check operator.
  2. BBETWEEN is inclusive of both bounds.
  3. True – “NULLS FIRST” puts nulls before any non‑null values, regardless of ASC/DESC.
  4. B – Prepared statements cannot bind identifiers (column names); you must whitelist or map them.
  5. A – 3rd page (0‑based) => OFFSET = 25 × 2 = 50, LIMIT = 25.
  6. C – The trick ORDER BY ISNULL(col) ASC, col ASC forces nulls to the end for both directions.
  7. A – Keyset pagination reads from a known cursor, avoiding the cost of skipping rows.

10️⃣ Further Reading & Resources

Lecture 04 · Fundamentals

Joins – Combining Data From Multiple Tables

Beginner ~55 min

0️⃣ Why Joins Matter

Real‑world databases are normalized: data is split across related tables to avoid redundancy. A join stitches those pieces back together so a security analyst (or any data‑consumer) can answer questions such as:

  • Which user bought the most‑expensive product?
  • What is the total revenue per country?
  • Are there any “orphaned” orders without a matching user?

1️⃣ Sample Schema (Orders, Users, Products, Order_Items)

ER diagram (Users ↔ Orders ↔ Order_Items ↔ Products) placeholder

Below is the DDL you can paste into any RDBMS (PostgreSQL/MySQL/SQLite) to create the demo data.

schema.sql
CREATE TABLE users (
    id          SERIAL      PRIMARY KEY,
    name        VARCHAR(50)   NOT NULL,
    email       VARCHAR(100)  UNIQUE,
    country     VARCHAR(50)
);

CREATE TABLE products (
    id          SERIAL      PRIMARY KEY,
    sku         VARCHAR(30)   UNIQUE,
    name        VARCHAR(100)  NOT NULL,
    price       NUMERIC(9,2) NOT NULL
);

CREATE TABLE orders (
    id          SERIAL      PRIMARY KEY,
    user_id     INT         REFERENCES users(id),
    order_date  TIMESTAMP    DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    id          SERIAL      PRIMARY KEY,
    order_id    INT         REFERENCES orders(id),
    product_id  INT         REFERENCES products(id),
    quantity    INT         DEFAULT 1
);

2️⃣ Join Types – Theory (with Venn diagrams)

Join TypeResult SetTypical Syntax
INNER JOIN Only rows that have matching keys in *both* tables. SELECT … FROM A INNER JOIN B ON A.id = B.fk
LEFT (OUTER) JOIN All rows from the left table + matching rows from the right; NULL where no match. SELECT … FROM A LEFT JOIN B ON A.id = B.fk
RIGHT (OUTER) JOIN All rows from the right table + matching rows from the left; NULL where no match. SELECT … FROM A RIGHT JOIN B ON A.id = B.fk
FULL (OUTER) JOIN All rows from both tables; NULLs where one side is missing. SELECT … FROM A FULL OUTER JOIN B ON A.id = B.fk
CROSS JOIN Cartesian product (every row of A paired with every row of B). Use with care! SELECT … FROM A CROSS JOIN B
NATURAL JOIN Implicit join on all columns with the same name. Rarely used in security reporting – can hide bugs. SELECT … FROM A NATURAL JOIN B

Below are simple Venn‑diagram placeholders for each join type. Replace them with real SVG/PNG graphics in your LMS.

JoinDiagram
INNER
Venn – overlapping area only
LEFT
All of left + overlap
RIGHT
All of right + overlap
FULL
Entire left & right circles
CROSS
Cartesian grid

3️⃣ Practical Join Syntax – Variations & Shortcuts

3.1 Using USING (when column names match)

SELECT o.id, u.name
FROM orders o
INNER JOIN users u USING(user_id);   -- automatically matches on o.user_id = u.id

3.2 Table Aliases – make queries readable

SELECT o.id AS order_id,
       u.name AS buyer,
       p.name AS product,
       oi.quantity
FROM order_items oi
JOIN orders o   ON oi.order_id   = o.id
JOIN users u    ON o.user_id    = u.id
JOIN products p ON oi.product_id = p.id
WHERE p.price > 1000
ORDER BY o.id;

3.3 Self‑Join – hierarchical data (e.g., employee‑manager)

SELECT e.name AS employee,
       m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
ORDER BY e.name;

3.4 Joining More Than Two Tables

The same JOIN keyword can be chained; the DB engine builds an internal join tree.

SELECT u.name,
       SUM(p.price * oi.quantity) AS total_spent
FROM users u
JOIN orders o       ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p    ON oi.product_id = p.id
GROUP BY u.name
HAVING total_spent > 5000
ORDER BY total_spent DESC;

4️⃣ Common Pitfalls & Performance Tips

  • Cartesian product – forgetting the ON clause (or using CROSS JOIN) creates huge result sets that can crash the DB or hide the real problem.
  • Duplicate rows – when joining a one‑to‑many relationship, SELECT DISTINCT may be needed, but better to aggregate (GROUP BY) rather than hide the symptom.
  • Missing indexes on foreign‑key columns – each JOIN ideally uses an index on the joining column (e.g., CREATE INDEX idx_orders_user_id ON orders(user_id);).
  • Ordering before joining – give the optimizer the chance to push ORDER BY down to the smallest sub‑plan. Usually you order after the final join.
  • Implicit joins (comma‑separated FROM) – legacy syntax FROM a, b WHERE a.id=b.id is harder to read and more error‑prone.

4.1 How the Optimizer Builds a Join Tree

Modern engines (PostgreSQL, MySQL, SQL‑Server) evaluate joins in the order that yields the lowest estimated cost. You can view the plan with EXPLAIN (or EXPLAIN ANALYZE for real timing).

EXPLAIN ANALYZE
SELECT u.name, o.id, p.name, oi.quantity
FROM users u
JOIN orders o       ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p    ON oi.product_id = p.id
WHERE p.price > 1000
LIMIT 20;

The output will show whether each join is using an Index Scan, a Hash Join, or a Nested Loop. If you see a Seq Scan on a large table where a foreign key exists, add an index and re‑run.

5️⃣ Hands‑On Lab – Join Challenges

🎯 Exercise 4.1 — Build the Sample Database

Import schema.sql (provided above) into your favourite RDBMS. Populate it with at least the following rows (feel free to add more):

TableSample Rows
users
INSERT INTO users (name,email,country) VALUES
('Alice','alice@example.com','USA'),
('Bob','bob@example.com','India'),
('Charlie','charlie@example.com','UK');
products
INSERT INTO products (sku,name,price) VALUES
('SKU-001','Gaming Laptop',2500.00),
('SKU-002','Office Chair',150.00),
('SKU-003','USB‑C Hub',45.99);
orders
INSERT INTO orders (user_id,order_date) VALUES
(1,'2024-04-01'),(2,'2024-04-03'),(1,'2024-04-05');
order_items
INSERT INTO order_items (order_id,product_id,quantity) VALUES
(1,1,1),(1,3,2),
(2,2,1),
(3,1,1),(3,2,1);
🎯 Exercise 4.2 — Write Five Queries
  1. Inner join – List every order (order_id) together with the buyer’s name and the total amount for that order (sum of price × quantity).
  2. Left join – Show all users and the list of orders they placed; include users with zero orders (display NULL for order_id).
  3. Self‑join – Assuming the users table had a referrer_id column (who invited them), write a query that shows each user and the name of the referrer. (You may add a dummy column to practice.)
  4. Cross join (Cartesian product) – caution – Show the total number of possible user × product pairs. Then limit the output to the first 5 rows to avoid flooding the console.
  5. Full outer join (PostgreSQL only) – Return every user and every order, even if there is no matching pair (use FULL OUTER JOIN). Identify “orphaned” orders (no user) and “orphaned” users (no order) in the same result set.

Take a screenshot of each query’s result and include it in a short lab‑report (max 300 words) describing what you learned.

🎯 Exercise 4.3 — Index Performance Test

Run the inner‑join revenue query from Exercise 4.2 on a table with at least 10 000 rows (you can generate dummy data with a script). Capture the EXPLAIN ANALYZE output before adding indexes on orders.user_id, order_items.order_id and order_items.product_id. Then add the indexes, re‑run the same EXPLAIN ANALYZE and compare the total execution time.

Write a 150‑word summary explaining the difference.

6️⃣ Cheat‑Sheet – Join Syntax at a Glance

SyntaxWhen to Use
FROM A INNER JOIN B ON A.id = B.fk Only records with matching keys (most common).
FROM A LEFT JOIN B ON A.id = B.fk All rows from A; keep B data where it exists (good for “orphan detection”).
FROM A RIGHT JOIN B ON A.id = B.fk All rows from B; rare in practice (swap tables for LEFT JOIN).
FROM A FULL OUTER JOIN B ON A.id = B.fk Show every row from both tables – good for data‑reconciliation.
FROM A CROSS JOIN B Cartesian product; use only for combinatorial generation (e.g., all seat‑price combos).
FROM A JOIN B USING(col) When column names are identical – less typing.
FROM A NATURAL JOIN B Implicit join on all same‑named columns – **avoid** unless you fully control schema.

7️⃣ Knowledge‑Check Quiz (7 questions)

  1. Which join returns every row from the left table and NULLs for the right table when there is no match?
    INNER JOIN
    LEFT JOIN
    FULL OUTER JOIN
  2. What is the result of a CROSS JOIN between a table with 5 rows and another with 8 rows?
    13 rows
    40 rows
    5 rows
  3. True or False – NATURAL JOIN is safe to use in a security‑report because it guarantees you are joining on the right columns.
  4. Which clause can be used to eliminate duplicate rows after a join?
    WHERE
    GROUP BY
    DISTINCT
  5. You have an orders table (order_id, user_id) and a users table (id, name). Which index will most improve the performance of the following query?
    SELECT o.id, u.name
    FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE u.country = 'USA';
    Index on orders.id
    Index on users.country
    Composite index on orders(user_id)
  6. Which of the following statements is incorrect about LEFT JOIN semantics?
    The left table’s rows are never filtered out.
    NULLs appear for right‑table columns when there is no match.
    LEFT JOIN is equivalent to RIGHT JOIN with the table order swapped.
  7. In PostgreSQL, which command displays the actual join execution plan with timing information?
    EXPLAIN
    EXPLAIN ANALYZE
    SHOW PLAN
Show Answers
  1. BLEFT JOIN keeps all rows from the left side.
  2. B – 5 × 8 = 40 rows (Cartesian product).
  3. FalseNATURAL JOIN implicitly matches on *all* same‑named columns; a stray column with the same name can cause an unintended join.
  4. CDISTINCT removes duplicate rows after a join.
  5. C – A composite index on orders(user_id) (or a separate index on users.id, which is already a primary key) will help the join; the filter on users.country is secondary.
  6. C – LEFT JOIN is not the same as RIGHT JOIN simply by swapping order; the side that receives NULLs matters.
  7. BEXPLAIN ANALYZE runs the query and prints real execution times.

8️⃣ Further Reading & Templates

Lecture 05 · Fundamentals

Aggregation & GROUP BY

Beginner ~50 min

0️⃣ Why Aggregation Is a Core Skill

Aggregation turns rows into business‑level numbers. In a penetration‑test report you’ll often need to say:

  • 142 hosts responded on port 22.”
  • “The average CVSS score of all critical findings is 9.2.”
  • “Total data exfiltrated: 4.3 GB.”
All of those statements are produced with COUNT(), SUM(), AVG() and friends, combined with GROUP BY.

1️⃣ Sample Schema (Orders ↔ Users ↔ Products)

ER‑diagram (Users ↔ Orders ↔ Order_Items ↔ Products) placeholder

Copy‑paste the DDL below into any RDBMS (PostgreSQL, MySQL or SQLite) and populate it with a few rows – you’ll need it for all the lab exercises.

schema.sql
CREATE TABLE users (
    id      SERIAL      PRIMARY KEY,
    name    VARCHAR(50)   NOT NULL,
    country VARCHAR(30)
);

CREATE TABLE products (
    id    SERIAL      PRIMARY KEY,
    sku   VARCHAR(30)   UNIQUE,
    name  VARCHAR(100)  NOT NULL,
    price NUMERIC(9,2) NOT NULL,
    stock INT         DEFAULT 0
);

CREATE TABLE orders (
    id         SERIAL      PRIMARY KEY,
    user_id    INT         REFERENCES users(id),
    order_date TIMESTAMP    DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    id         SERIAL      PRIMARY KEY,
    order_id   INT         REFERENCES orders(id),
    product_id INT         REFERENCES products(id),
    qty        INT         DEFAULT 1
);

2️⃣ Core Aggregate (Summary) Functions

FunctionWhat It DoesTypical Example
COUNT(*)Counts rows (includes NULLs)SELECT COUNT(*) FROM products;
COUNT(col)Counts non‑NULL values of colSELECT COUNT(email) FROM users;
SUM(col)Adds all numeric valuesSELECT SUM(stock) FROM products;
AVG(col)Average (arithmetic mean)SELECT AVG(price) FROM products;
MIN(col)Smallest valueSELECT MIN(order_date) FROM orders;
MAX(col)Largest valueSELECT MAX(price) FROM products;
STRING_AGG(col, ', ') (Postgres) / GROUP_CONCAT(col SEPARATOR ', ') (MySQL)Concatenates strings per groupSELECT country, STRING_AGG(name, ', ') FROM users GROUP BY country;
JSON_AGG(col) (Postgres)Returns a JSON array per groupSELECT JSON_AGG(row_to_json(p)) FROM products p GROUP BY category;

All aggregate functions ignore NULL values automatically (except COUNT(*) which counts rows regardless of NULL).

3️⃣ Basic GROUP BY Syntax

SELECT category,
       COUNT(*)            AS product_cnt,
       AVG(price)          AS avg_price,
       SUM(stock * price) AS inventory_value
FROM products
GROUP BY category
ORDER BY inventory_value DESC;

Rules to remember:

  • Every column in the SELECT list must either be an aggregate or appear in the GROUP BY clause.
  • Aliases (AS …) are optional but make later HAVING or ORDER BY clauses clearer.
  • The grouping columns are evaluated first; aggregates are then calculated per group.

4️⃣ Filtering Groups – HAVING

HAVING works like WHERE but **after** aggregation.

SELECT country,
       COUNT(*)            AS user_cnt
FROM users
GROUP BY country
HAVING COUNT(*) >= 5
ORDER BY user_cnt DESC;

Typical security‑report use‑cases:

  • Show only assets that have **≥ 3** high‑severity findings.
  • List countries where the *average* CVSS of discovered vulns exceeds 7.

5️⃣ Advanced Grouping – ROLLUP, CUBE, GROUPING SETS

These constructs let you produce **multiple aggregation levels** in a single query.

5.1 ROLLUP – Hierarchical Sub‑Totals

SELECT u.country,
       EXTRACT(YEAR FROM o.order_date) AS yr,
       SUM(p.price * oi.qty) AS revenue
FROM orders o
JOIN users u           ON o.user_id = u.id
JOIN order_items oi  ON o.id = oi.order_id
JOIN products p       ON oi.product_id = p.id
GROUP BY ROLLUP(u.country, EXTRACT(YEAR FROM o.order_date))
ORDER BY u.country, yr;

Result rows include:

  • Revenue per (country, year).
  • Yearly subtotal per country (country, NULL).
  • Grand total (both columns NULL).

5.2 CUBE – All Possible Combinations

SELECT u.country,
       EXTRACT(YEAR FROM o.order_date) AS yr,
       COUNT(DISTINCT o.id) AS orders_cnt
FROM orders o
JOIN users u ON o.user_id = u.id
GROUP BY CUBE(u.country, EXTRACT(YEAR FROM o.order_date))
ORDER BY u.country, yr;

CUBE returns every combination, i.e., per country, per year, per country‑year, plus the grand total.

5.3 GROUPING SETS – Hand‑Picked Mix

SELECT u.country,
       EXTRACT(MONTH FROM o.order_date) AS month,
       SUM(p.price * oi.qty) AS rev
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
GROUP BY GROUPING SETS(
    (u.country, EXTRACT(MONTH FROM o.order_date)),
    (u.country),
    (EXTRACT(MONTH FROM o.order_date))
)
ORDER BY u.country, month;

Only the three specified groupings appear – no grand total.

6️⃣ Window (Analytic) Functions – Aggregates Without Collapse

Sometimes you need a running total or a rank while still seeing each individual order. Window functions let you do that.

6.1 Syntax Overview

SELECT o.id,
       u.name,
       p.name AS product,
       oi.qty,
       p.price,
       SUM(p.price * oi.qty) OVER(
           PARTITION BY u.id
           ORDER BY o.order_date
       ) AS cumulative_spend
FROM orders o
JOIN users u          ON o.user_id = u.id
JOIN order_items oi   ON o.id = oi.order_id
JOIN products p       ON oi.product_id = p.id
ORDER BY u.id, o.order_date;

6.2 Real‑World Example – Top‑3 Products per Country

SELECT u.country,
       p.name AS product,
       SUM(p.price * oi.qty) AS revenue,
       RANK() OVER(PARTITION BY u.country
                     ORDER BY SUM(p.price * oi.qty) DESC) AS rnk
FROM order_items oi
JOIN orders o      ON oi.order_id = o.id
JOIN users u       ON o.user_id = u.id
JOIN products p    ON oi.product_id = p.id
GROUP BY u.country, p.name
HAVING SUM(p.price * oi.qty) > 0
ORDER BY u.country, rnk
LIMIT 30;   -- 10 countries × top‑3 each

Window functions are **not** listed in GROUP BY; they are calculated after the grouping step.

7️⃣ Performance Tips for Aggregations

  • Index the grouping columns. An index on products.category speeds GROUP BY category.
  • Covering indexes (e.g., CREATE INDEX idx_products_cat_price ON products(category, price);) let the engine satisfy both the grouping and the aggregation from the index alone.
  • Materialized views for heavy daily reports (e.g., CREATE MATERIALIZED VIEW daily_sales AS …).
  • Filter early. Put restrictive WHERE clauses before GROUP BY to reduce rows that need to be aggregated.
  • Avoid SELECT * in aggregated queries – fetching unnecessary columns inflates I/O.
  • Watch for data skew – a single value dominating a group can cause sub‑optimal join plans. Consider partitioning or separate statistics.

7.1 Index Impact Demo

EXPLAIN ANALYZE
SELECT category,
       COUNT(*)       AS cnt,
       AVG(price)   AS avg_price
FROM products
GROUP BY category;

If the plan shows a Seq Scan on a million‑row table, create an index:

CREATE INDEX idx_products_category ON products(category);

Rerun the EXPLAIN ANALYZE – you should see an Index Scan and a dramatically lower total cost.

8️⃣ Common Pitfalls & How to Avoid Them

  • Missing GROUP BY clause – “column must appear in the GROUP BY clause” error.
  • Counting with COUNT(col) when you need total rowsCOUNT(col) skips NULLs.
  • Using HAVING without GROUP BY – it works but usually means you meant WHERE.
  • Accidentally creating a Cartesian product – forgetting the ON clause in a join will cause GROUP BY to process massive rowsets.
  • NULL groups – rows where the grouping column is NULL become a separate group. Use COALESCE(col,'N/A') if you prefer them merged.
  • Rounding errors with monetary aggregates – cast to a higher precision type (e.g., price::numeric(12,4)) before summing.

9️⃣ Hands‑On Lab – Aggregation Challenges

🎯 Exercise 5.1 — Load the Sample Schema

Run schema.sql (Section 1). Insert at least 20 products spanning 4 categories, 15 users across 3 countries, and 30 orders with associated order_items. Feel free to use any data you like – the point is to have enough rows for meaningful grouping.

🎯 Exercise 5.2 — Write Five Aggregation Queries
  1. Total inventory value per categorySUM(price * stock) grouped by category.
  2. Average order value per country – join the four tables and AVG(price * qty) grouped by users.country.
  3. Top‑3 selling products overall – order by SUM(qty) descending, LIMIT 3.
  4. Countries with more than 5 orders – use GROUP BY users.country and HAVING COUNT(DISTINCT orders.id) > 5.
  5. Running total of revenue per user (window function) – calculate a cumulative sum ordered by order_date.

Capture a screenshot of each result and write a one‑sentence interpretation (e.g., “Electronics holds the highest inventory value: $125 k”).

🎯 Exercise 5.3 — Performance Test with ROLLUP

Run the ROLLUP query from Section 5.1 on a dataset of at least 10 000 orders. Record the execution time with EXPLAIN ANALYZE **before** adding any indexes, then create indexes on orders.order_date and users.country and repeat. Write a ≤ 100‑word summary of the performance impact.

10️⃣ Cheat‑Sheet – Aggregation & GROUP BY Quick Reference

Clause / FunctionPurpose / Example
COUNT(*)Rows in the result set.
COUNT(col)Non‑NULL values in col.
SUM(col)Additive total.
AVG(col)Mean value.
MIN(col) / MAX(col)Smallest / largest value.
GROUP BY col1, col2 …Define grouping columns.
HAVING conditionFilter groups (post‑aggregation).
ORDER BY col ASC|DESCSort final rows.
ROLLUP(col1, col2)Hierarchical subtotals + grand total.
CUBE(col1, col2)All combination totals.
GROUPING SETS((…), (…))Custom mix of groupings.
SUM(col) OVER (PARTITION BY … ORDER BY …)Running total / rank without collapsing rows.
STRING_AGG(col, ', ') (Postgres) / GROUP_CONCAT(col SEPARATOR ', ') (MySQL)Concatenate strings per group.

11️⃣ Knowledge‑Check Quiz (7 questions)

  1. Which query returns the number of distinct product categories?
    SELECT COUNT(*) FROM products;
    SELECT COUNT(DISTINCT category) FROM products;
    SELECT COUNT(category) FROM products;
  2. What does this query do?
    SELECT country, COUNT(*) AS users
    FROM users
    GROUP BY country
    HAVING COUNT(*) > 3;
    A. Returns countries with **exactly** 3 users.
    B. Returns countries with **more than** 3 users.
    C. Returns every country regardless of user count.
  3. True or False – GROUP BY and DISTINCT are interchangeable.
  4. Which construct will give you a grand total, per‑country subtotal, per‑year subtotal, and per‑country‑year cells in a single result set?
    GROUP BY ROLLUP(country, YEAR(order_date))
    GROUP BY CUBE(country, YEAR(order_date))
    GROUP BY GROUPING SETS((country, YEAR), (country), (YEAR))
  5. In the window‑function example below, what is the effect of PARTITION BY u.id?
    SELECT o.id,
           SUM(p.price * oi.qty) OVER (PARTITION BY u.id ORDER BY o.order_date) AS cum_spend
    FROM orders o
    JOIN users u ON o.user_id = u.id
    JOIN order_items oi ON o.id = oi.order_id
    JOIN products p ON oi.product_id = p.id;
    A. Groups rows by user before calculating the cumulative sum.
    B. Filters rows to only those where u.id matches the current row.
    C. Orders the final result set by u.id.
  6. Which index will most improve this query’s performance?
    SELECT country, COUNT(*) AS cnt
    FROM users
    WHERE country = 'USA'
    GROUP BY country;
    Index on users.id.
    Composite index on (country, id).
    Index on users.country.
  7. What does the GROUPING() function return inside a ROLLUP query?
Show Answers
  1. BCOUNT(DISTINCT category) gives the number of unique categories.
  2. BHAVING COUNT(*) > 3 filters for groups with more than three users.
  3. FalseGROUP BY groups rows, while DISTINCT removes duplicate rows; they are not interchangeable.
  4. AROLLUP(country, YEAR(...)) yields the grand total, country subtotals, year subtotals, and country‑year cells.
  5. APARTITION BY u.id creates a separate window (bucket) per user, then computes the cumulative sum inside that bucket ordered by date.
  6. C – An index on the filtered column (users.country) allows the DB engine to satisfy both the WHERE and the GROUP BY efficiently.
  7. The GROUPING() function returns 1 for a column that is being aggregated at a higher level (i.e., when the column’s value is replaced by NULL in a subtotal or grand‑total row) and 0 otherwise. It is used to differentiate real NULL values from the artificial NULLs generated by ROLLUP/CUBE.

12️⃣ Further Reading & Templates

Lecture 06 · Core Concepts

Subqueries & CTEs

Intermediate ~60 min

Introduction to Subqueries

A subquery is a query written inside another SQL query. Subqueries allow developers to break complex problems into smaller logical steps.

Subqueries are commonly used for:

  • Filtering data
  • Comparing values
  • Generating temporary result sets
  • Performing calculations
  • Building advanced reports

Basic Subquery Example

A subquery can be placed inside a WHERE clause.

SELECT name
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
);

The inner query calculates the average salary, while the outer query returns employees earning above that average.

How Subqueries Work

SQL executes the inner query first, then uses the returned result in the outer query.

This allows queries to become dynamic and data-driven.

Subqueries in the SELECT Clause

Subqueries can also appear inside the SELECT statement.

SELECT
    name,
    (
        SELECT department_name
        FROM departments
        WHERE departments.id = employees.department_id
    ) AS department
FROM employees;

This query retrieves employee names along with department names.

Subqueries in the FROM Clause

A subquery inside the FROM clause acts like a temporary table.

SELECT avg_salary.department_id,
       avg_salary.average
FROM (
    SELECT department_id,
           AVG(salary) AS average
    FROM employees
    GROUP BY department_id
) AS avg_salary;

This approach improves readability for complex calculations.

Single-Row vs Multi-Row Subqueries

Some subqueries return one value, while others return multiple rows.

Single-row subqueries are often used with operators such as:

  • =
  • >
  • <

Multi-row subqueries are commonly used with:

  • IN
  • ANY
  • ALL

Using IN with Subqueries

SELECT name
FROM employees
WHERE department_id IN (
    SELECT id
    FROM departments
    WHERE location = 'London'
);

This query finds employees working in departments located in London.

Correlated Subqueries

A correlated subquery depends on values from the outer query.

SELECT name,
       salary
FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);

The inner query runs once for every row processed by the outer query.

Introduction to CTEs

A Common Table Expression (CTE) is a temporary named result set created using the WITH keyword.

CTEs improve readability and make complex queries easier to manage.

Common Table Expressions (CTE)

WITH high_value_orders AS (

    SELECT *
    FROM orders
    WHERE total > 1000
)

SELECT *
FROM high_value_orders;

The CTE creates a temporary result called high_value_orders.

Why Use CTEs?

CTEs provide several advantages:

  • Improve query readability.
  • Reduce repeated logic.
  • Make debugging easier.
  • Organize complex SQL statements.
  • Support recursive queries.

Multiple CTEs

You can define multiple CTEs in a single query.

WITH sales_total AS (

    SELECT customer_id,
           SUM(total) AS total_sales
    FROM orders
    GROUP BY customer_id
),

top_customers AS (

    SELECT *
    FROM sales_total
    WHERE total_sales > 5000
)

SELECT *
FROM top_customers;

This structure makes long queries much easier to understand.

Recursive CTEs

Recursive CTEs are used for hierarchical or tree-structured data.

WITH RECURSIVE numbers AS (

    SELECT 1 AS n

    UNION ALL

    SELECT n + 1
    FROM numbers
    WHERE n < 5
)

SELECT *
FROM numbers;

This query generates numbers from 1 to 5 recursively.

CTEs vs Subqueries

Both CTEs and subqueries solve similar problems, but they differ in readability and structure.

  • Subqueries are compact for small operations.
  • CTEs are better for complex multi-step logic.
  • CTEs improve maintainability.
  • Recursive operations require CTEs.

Performance Considerations

In some databases, CTEs may improve readability but not necessarily performance.

Database engines optimize queries differently, so performance testing is important for large datasets.

Real-World Use Cases

Subqueries and CTEs are heavily used in:

  • Business reporting
  • Analytics dashboards
  • Financial systems
  • Inventory management
  • Data warehousing
  • Hierarchical organizational data

Common Beginner Mistakes

  • Using subqueries when joins are simpler.
  • Writing deeply nested unreadable queries.
  • Forgetting aliases for subqueries.
  • Creating inefficient correlated subqueries.
  • Confusing recursive and non-recursive CTEs.

Best Practices

  • Use meaningful CTE names.
  • Keep queries readable and modular.
  • Use CTEs for multi-step transformations.
  • Test performance on large datasets.
  • Avoid unnecessary nested queries.

Practice

Practice

Create a subquery that returns employees earning more than the average salary.

Then create a CTE that selects products with stock greater than 100 and display the results.

Finally, write a recursive CTE that generates numbers from 1 to 10.

Summary

Subqueries and Common Table Expressions are powerful SQL tools used to build advanced and organized queries. Subqueries help solve problems step-by-step, while CTEs improve readability and support recursive operations.

Mastering these concepts is essential for writing professional SQL queries used in real-world applications and enterprise databases.

Lecture 07 · Core Concepts

Data Modification

Intermediate ~45 min

Introduction to Data Modification

Databases are not only used for retrieving data — they are also used to create, modify, and remove information.

SQL provides several commands for changing database records safely and efficiently.

  • INSERT — adds new data
  • UPDATE — modifies existing data
  • DELETE — removes data

INSERT, UPDATE, DELETE

INSERT INTO users (name, email)
VALUES ('Bob', 'bob@example.com');

UPDATE products
SET price = price * 1.1
WHERE category = 'Tech';

DELETE FROM logs
WHERE created_at < '2023-01-01';

Understanding INSERT

The INSERT INTO statement adds new rows into a table.

This command is commonly used in:

  • User registration systems
  • Order management systems
  • Inventory applications
  • Blog and CMS platforms

Basic INSERT Syntax

INSERT INTO students (name, age)
VALUES ('Alice', 21);

Here, a new row is inserted into the students table.

Inserting Multiple Rows

SQL allows inserting multiple records in a single query.

INSERT INTO products (name, price)
VALUES
('Mouse', 500),
('Keyboard', 1200),
('Monitor', 9000);

INSERT Without Specifying Columns

If values are provided in exact table order, column names can be omitted.

INSERT INTO users
VALUES (1, 'Ahmed', 'ahmed@example.com');

However, specifying column names is considered a better practice.

Understanding UPDATE

The UPDATE statement modifies existing records.

UPDATE employees
SET salary = 50000
WHERE id = 1;

This query updates the salary of the employee with ID 1.

Updating Multiple Columns

UPDATE users
SET
    city = 'Delhi',
    status = 'Active'
WHERE id = 5;

Importance of WHERE Clause

The WHERE clause determines which rows are affected.

UPDATE users
SET status = 'Inactive';

Without a WHERE clause, all rows in the table will be updated.

Understanding DELETE

The DELETE statement removes records from a table.

DELETE FROM customers
WHERE id = 10;

This removes the customer whose ID equals 10.

Deleting Multiple Records

DELETE FROM orders
WHERE status = 'Cancelled';

Deleting All Rows

Removing the WHERE clause deletes all records.

DELETE FROM logs;

Use this carefully because deleted rows may not be recoverable.

TRUNCATE vs DELETE

DELETE TRUNCATE
Removes selected rows Removes all rows
Supports WHERE clause No WHERE clause
Slower for large tables Faster operation

Using Conditions in Queries

Conditions help target specific rows during modification.

UPDATE products
SET stock = stock - 1
WHERE product_id = 100;

Working with NULL Values

UPDATE users
SET phone = NULL
WHERE id = 3;

Using Arithmetic Operations

SQL can perform calculations during updates.

UPDATE employees
SET salary = salary + 5000
WHERE department = 'IT';

Transactions and Safety

Transactions help ensure database consistency.

BEGIN;

UPDATE accounts
SET balance = balance - 1000
WHERE id = 1;

UPDATE accounts
SET balance = balance + 1000
WHERE id = 2;

COMMIT;

Transactions are especially useful in banking and payment systems.

Rollback Example

ROLLBACK;

Rollback cancels changes before they are permanently saved.

Best Practices

  • Always use WHERE with UPDATE and DELETE
  • Back up important data before modifications
  • Test queries on small datasets first
  • Use transactions for critical operations
  • Validate user input before database changes

Common Mistakes

  • Forgetting the WHERE clause
  • Inserting duplicate data accidentally
  • Deleting records without backup
  • Using incorrect column order
  • Ignoring transaction handling

Real-World Examples

  • User account registration
  • Updating product inventory
  • Deleting expired logs
  • Processing customer orders
  • Managing employee databases

Mini Practice

Try writing queries for the following:

  • Insert a new student record
  • Update a product price
  • Delete inactive users
  • Increase salaries by 10%
  • Use transactions with COMMIT and ROLLBACK
Lecture 08 · Core Concepts

Constraints & Indexes

Intermediate ~50 min

Introduction to Constraints

Constraints are rules applied to database tables to ensure data accuracy, consistency, and integrity.

They help prevent invalid or duplicate data from being inserted into a database.

Constraints are essential in professional database systems because they enforce business rules directly at the database level.

Why Constraints Matter

Without constraints, databases may contain:

  • Duplicate records
  • Missing required values
  • Invalid relationships
  • Incorrect data formats
  • Orphaned records

Constraints help maintain reliable and trustworthy data.

Ensuring Data Integrity

CREATE TABLE authors (

    id SERIAL PRIMARY KEY,

    name TEXT NOT NULL,

    email TEXT UNIQUE
);

This table demonstrates multiple important constraints:

  • PRIMARY KEY ensures unique row identification.
  • NOT NULL prevents empty values.
  • UNIQUE prevents duplicate emails.

Primary Key Constraint

A primary key uniquely identifies each row in a table.

CREATE TABLE students (

    student_id INT PRIMARY KEY,

    name TEXT
);

Primary keys cannot contain duplicate or NULL values.

Auto-Incrementing Keys

Databases often generate primary key values automatically.

id SERIAL PRIMARY KEY

The SERIAL keyword automatically increases values for each new row.

NOT NULL Constraint

The NOT NULL constraint requires a column to always contain a value.

CREATE TABLE products (

    name TEXT NOT NULL,

    price DECIMAL
);

This prevents incomplete records from being inserted.

UNIQUE Constraint

The UNIQUE constraint ensures all values in a column are different.

CREATE TABLE users (

    username TEXT UNIQUE
);

This is commonly used for usernames, emails, and account IDs.

DEFAULT Constraint

The DEFAULT constraint assigns automatic values when no value is provided.

CREATE TABLE orders (

    status TEXT DEFAULT 'Pending'
);

If no status is specified, the database automatically inserts Pending.

CHECK Constraint

The CHECK constraint validates data using conditions.

CREATE TABLE employees (

    age INT CHECK (age >= 18)
);

This prevents invalid age values from being inserted.

Foreign Key Constraint

Foreign keys create relationships between tables.

CREATE TABLE books (

    id SERIAL PRIMARY KEY,

    author_id INT REFERENCES authors(id)
);

The author_id must match an existing value in the authors table.

Referential Integrity

Foreign keys help maintain referential integrity by ensuring related records remain valid.

This prevents orphaned records and broken relationships.

Composite Keys

A composite key combines multiple columns into a single primary key.

CREATE TABLE enrollments (

    student_id INT,

    course_id INT,

    PRIMARY KEY (student_id, course_id)
);

This ensures each student-course combination is unique.

Introduction to Indexes

Indexes improve database query performance by allowing faster data retrieval.

Without indexes, databases may need to scan entire tables to locate records.

How Indexes Work

Indexes function similarly to a book index. Instead of reading every page, the database quickly locates the required information.

Indexes significantly improve search speed for large datasets.

Creating an Index

CREATE INDEX idx_author_name
ON authors(name);

This creates an index on the name column.

Indexes on Multiple Columns

Indexes can include multiple columns.

CREATE INDEX idx_employee
ON employees(last_name, first_name);

Multi-column indexes improve queries involving multiple search conditions.

Unique Indexes

A unique index prevents duplicate values while also improving search performance.

CREATE UNIQUE INDEX idx_email
ON users(email);

When to Use Indexes

Indexes are useful for columns frequently used in:

  • WHERE clauses
  • JOIN operations
  • ORDER BY clauses
  • GROUP BY statements

When Too Many Indexes Become a Problem

Although indexes improve reading speed, they can slow down insert, update, and delete operations.

Each index must also be updated whenever table data changes.

Clustered vs Non-Clustered Indexes

Some database systems support different index types:

  • Clustered Index: Organizes physical table data.
  • Non-Clustered Index: Stores separate index structures.

The exact behavior depends on the database system being used.

Viewing Existing Indexes

Most databases provide commands to inspect indexes.

SELECT *
FROM pg_indexes
WHERE tablename = 'authors';

Dropping an Index

Indexes can be removed if no longer needed.

DROP INDEX idx_author_name;

Real-World Uses

Constraints and indexes are critical in:

  • Banking systems
  • E-commerce applications
  • Inventory systems
  • Authentication systems
  • Large enterprise databases
  • Analytics platforms

Benefits of Constraints & Indexes

  • Improve data integrity.
  • Prevent invalid records.
  • Enforce business rules.
  • Increase query performance.
  • Support scalable applications.

Common Beginner Mistakes

  • Creating too many unnecessary indexes.
  • Forgetting foreign key relationships.
  • Using NULL values incorrectly.
  • Ignoring unique constraints on important fields.
  • Adding indexes to very small tables unnecessarily.

Best Practices

  • Always define primary keys.
  • Use foreign keys for related tables.
  • Index frequently searched columns.
  • Validate important business rules using constraints.
  • Regularly monitor query performance.

Practice

Practice

Create a table called customers with:

  • A primary key column
  • A unique email column
  • A NOT NULL name column
  • A CHECK constraint for age greater than 18

Then create an index on the email column.

Summary

Constraints and indexes are essential database features used to maintain reliable data and improve performance. Constraints enforce data integrity rules, while indexes optimize query execution speed.

Understanding these concepts is critical for designing scalable, secure, and efficient database systems.

Lecture 09 · Advanced

Transactions & ACID

Advanced ~50 min

All or Nothing

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
Lecture 10 · Advanced

Stored Procedures

Advanced ~60 min

Functions and procedures allow you to save SQL logic on the database server.

Lecture 11 · Advanced

Window Functions

Advanced ~65 min

Advanced Analytics

SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) as ranking
FROM employees;
Lecture 12 · Capstone

Capstone Project: E-commerce DB

Advanced ~150 min

Design and implement a database schema for an e-commerce platform, including products, users, orders, and reviews.

-- Project goals:
-- 1. Create tables with proper constraints
-- 2. Populate with sample data
-- 3. Write complex queries for sales reports
-- 4. Optimize queries with indexes
Lecture 13 · Advanced

Database Design & Normalization

Intermediate ~50 min Requires: Lecture 12

Content coming soon...

Lecture 14 · Advanced

Performance & Query Optimization

Advanced ~55 min Requires: Lecture 13

Content coming soon...

Lecture 15 · Advanced

Security & Access Control

Advanced ~45 min Requires: Lecture 14

Content coming soon...

Lecture 16 · Professional

Final Project — Enterprise Database

Advanced ~90 min Requires: All Previous

Content coming soon...