SQL Doesn't Run in the Order You Write It

E very developer who has written SQL has made the same mistake at least once: used a SELECT alias in a WHERE clause, or tried to filter on an aggregate without HAVING, and gotten an error that felt completely arbitrary. It is not arbitrary. It is the database telling you that you wrote the query in one order and expected it to execute in that order. It does not.

Understanding the actual execution order is one of those things that turns SQL from a language you fight into a language you reason with.

The Order the Database Actually Uses#

When a query runs, the engine processes clauses in this sequence — regardless of how you wrote them:

1. FROM + JOIN   -- build the full working dataset
2. WHERE         -- filter rows before any grouping
3. GROUP BY      -- collapse rows into groups
4. HAVING        -- filter groups after aggregation
5. SELECT        -- compute output columns, assign aliases
6. DISTINCT      -- deduplicate computed output
7. ORDER BY      -- sort the final result
8. LIMIT/OFFSET  -- slice rows from the sorted result

%%{init: {'themeVariables': {'fontSize': '16px'}, 'flowchart': {'padding': 25, 'nodeSpacing': 50, 'rankSpacing': 80}}}%% flowchart LR A["FROM + JOIN\nBuild the dataset"] --> B["WHERE\nFilter rows"] B --> C["GROUP BY\nCollapse into groups"] C --> D["HAVING\nFilter groups"] D --> E["SELECT\nCompute columns"] E --> F["DISTINCT\nDeduplicate"] F --> G["ORDER BY\nSort results"] G --> H["LIMIT / OFFSET\nSlice the output"] style A fill:#1e3a5f,color:#e8edf2,stroke:#2d6a9f,stroke-width:1.5px style B fill:#1e3a5f,color:#e8edf2,stroke:#2d6a9f,stroke-width:1.5px style C fill:#1e3a5f,color:#e8edf2,stroke:#2d6a9f,stroke-width:1.5px style D fill:#1e3a5f,color:#e8edf2,stroke:#2d6a9f,stroke-width:1.5px style E fill:#2d5a27,color:#e8edf2,stroke:#4a9e3f,stroke-width:1.5px style F fill:#2d5a27,color:#e8edf2,stroke:#4a9e3f,stroke-width:1.5px style G fill:#5a3a1e,color:#e8edf2,stroke:#9e6a2d,stroke-width:1.5px style H fill:#5a3a1e,color:#e8edf2,stroke:#9e6a2d,stroke-width:1.5px

The written order — SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ... LIMIT — is the opposite in most places. The database parses it in written order but executes it in logical order.

What This Means in Practice#

WHERE cannot see SELECT aliases#

SELECT runs after WHERE. So at the time WHERE evaluates, aliases defined in the select list do not exist yet.

-- fails: 'discounted' is not defined when WHERE runs
SELECT price * 0.9 AS discounted
FROM products
WHERE discounted < 100;

-- works: repeat the expression
SELECT price * 0.9 AS discounted
FROM products
WHERE price * 0.9 < 100;

Most databases will reject the first form outright. A few (like BigQuery) allow alias references in WHERE as a non-standard extension — but relying on that makes your SQL non-portable.

WHERE cannot filter aggregates#

WHERE runs before GROUP BY, so aggregate functions like COUNT, SUM, and AVG have not been computed yet when WHERE fires. This is exactly what HAVING is for.

-- fails: COUNT(*) doesn't exist at WHERE time
SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE COUNT(*) > 5
GROUP BY customer_id;

-- correct
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;

A useful rule: if your filter references an aggregate function, it belongs in HAVING. If it references a raw column, it belongs in WHERE. Putting row filters in WHERE instead of HAVING is also faster — the database eliminates rows before grouping rather than after.

ORDER BY can see SELECT aliases#

ORDER BY runs after SELECT, so aliases are fully in scope. This is one of the few places you can use them.

-- valid: ORDER BY runs after SELECT
SELECT price * 0.9 AS discounted
FROM products
ORDER BY discounted DESC;

This works in all major databases. It is not a trick — it is a direct consequence of the execution order.

LIMIT cuts after sorting#

LIMIT is the last thing that executes. The database sorts the entire result set first, then takes the top N rows. This means ORDER BY ... LIMIT 10 reliably gives you the correct top 10 — the sort is not skipped or approximated.

On large tables, ORDER BY without an index forces a full sort before LIMIT can cut it down. If you are paginating a large dataset, make sure the column you are ordering by is indexed. Without it, you pay the sort cost on every page.

Subqueries and CTEs as a Workaround#

When you need to filter on a SELECT alias — say, a computed value you do not want to repeat — the cleanest approach is a subquery or CTE. Both establish a new logical scope where the computed column becomes a real column.

-- CTE: compute once, filter in the outer query
WITH priced AS (
    SELECT *, price * 0.9 AS discounted
    FROM products
)
SELECT *
FROM priced
WHERE discounted < 100;

The CTE runs as its own complete query. When the outer WHERE executes, discounted already exists as a real column in priced. Same execution rules apply — but you have stacked two complete logical scopes on top of each other.

DISTINCT Is Not a Function#

A common misconception is that DISTINCT is a modifier on a single column, like SELECT DISTINCT(status). It is not. DISTINCT applies to the entire row — it deduplicates based on the combination of all selected columns after SELECT computes them.

-- these are identical — the parens do nothing
SELECT DISTINCT status FROM orders;
SELECT DISTINCT(status) FROM orders;

-- DISTINCT applies to both columns together
SELECT DISTINCT customer_id, status FROM orders;
-- returns one row per unique (customer_id, status) pair, not per customer_id

This trips people up when they expect DISTINCT customer_id to deduplicate across all rows and then return other columns too. That is GROUP BY, not DISTINCT.

The Mental Model#

Think of a SQL query as a pipeline. Each clause hands its output to the next:

Table(s) 
  → JOIN produces the full combined dataset
  → WHERE drops rows that don't match
  → GROUP BY collapses remaining rows into groups
  → HAVING drops groups that don't match
  → SELECT computes the output columns from what's left
  → DISTINCT removes duplicate output rows
  → ORDER BY sorts the result
  → LIMIT returns the first N rows

At each step, you can only work with what the previous step gave you. WHERE gets raw rows. HAVING gets grouped aggregates. ORDER BY gets fully computed output. Once that sequence is clear, the rules stop feeling like arbitrary restrictions and start feeling obvious.

SQL Doesn’t Run in the Order You Write It

The Order the Database Actually Uses#

What This Means in Practice#

WHERE cannot see SELECT aliases#

WHERE cannot filter aggregates#

ORDER BY can see SELECT aliases#

LIMIT cuts after sorting#

Subqueries and CTEs as a Workaround#

DISTINCT Is Not a Function#

The Mental Model#