Home Posts SQL Optimization [2026] Window Functions & CTE Rewrites
Developer Reference

SQL Optimization [2026] Window Functions & CTE Rewrites

SQL Optimization [2026] Window Functions & CTE Rewrites
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · April 28, 2026 · 11 min read

Bottom Line

Most slow analytic SQL is not fixed with one magic index. It gets faster when you reduce repeated scans, narrow sort work, and rewrite window and CTE patterns so the optimizer has less work to do.

Key Takeaways

  • Use QUALIFY or a single outer filter to avoid extra ranking subqueries.
  • Default window frames often surprise LAST_VALUE; set ROWS explicitly.
  • A reused CTE may be recomputed; materialize once when the engine will not reuse it.
  • Pre-aggregate before joins to reduce shuffle, sort, and spill pressure.
  • Benchmark rewrites with the plan, not intuition: scan bytes, sorts, spills, and row counts.

High-volume analytics queries usually fail in predictable ways: repeated scans hidden behind readable CTEs, window functions that sort more data than necessary, and ranking patterns that force extra passes through already large intermediates. This cheat sheet focuses on practical rewrites you can apply fast, then verify with the plan. The goal is simple: reduce bytes read, sort width, spill risk, and duplicate work before you touch heavier physical design changes.

Key Takeaways

  • Use QUALIFY or a single outer filter to avoid extra ranking subqueries.
  • Default window frames often surprise LAST_VALUE; set ROWS explicitly.
  • A reused CTE may be recomputed; materialize once when the engine will not reuse it.
  • Pre-aggregate before joins to reduce shuffle, sort, and spill pressure.
  • Benchmark rewrites with the plan, not intuition: scan bytes, sorts, spills, and row counts.

Quick Reference

Bottom Line

Rewrite for less work before you tune for more horsepower. In practice, the biggest wins come from shrinking intermediate result sets and stopping the planner from doing the same expensive step twice.

Press / to focus search, Esc to clear, j and k to jump sections, and c to copy the next visible code block.

Shortcut Action Use It For
/ Focus filter Jump straight to a rewrite pattern
Esc Clear filter Reset the cheat sheet view
j Next section Move through the reference without scrolling
k Previous section Review the prior block quickly
c Copy code Grab the next visible snippet
  • Use windows when you need row context without collapsing rows.
  • Use grouped aggregates when you can reduce cardinality early.
  • Use temp tables or materialized stages when the same heavy subquery feeds multiple branches.
  • Before sharing sample data or plans with a team, scrub sensitive fields with the Data Masking Tool.

Commands by Purpose

Filter top-N per partition

  • Best when you need one row or a small top-N slice per key.
  • Prefer QUALIFY where supported because it removes one nesting layer.
  • If QUALIFY is unavailable, keep the ranking in one subquery and filter once.
SELECT
  customer_id,
  order_id,
  event_ts
FROM orders
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY customer_id
  ORDER BY event_ts DESC
) = 1;

Replace row-to-row self-joins

  • Self-joins can explode row counts and add unnecessary join work.
  • LAG and LEAD usually preserve intent with less shuffle.
SELECT
  session_id,
  event_ts,
  event_value - LAG(event_value) OVER (
    PARTITION BY session_id
    ORDER BY event_ts
  ) AS delta
FROM events;

Fix misleading default frames

  • The default frame is often fine for ranking, but not for every value function.
  • LAST_VALUE is a common trap because the default frame may stop at the current peer group.
  • Set ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when you want the partition-wide last row.
SELECT
  account_id,
  event_ts,
  LAST_VALUE(status) OVER (
    PARTITION BY account_id
    ORDER BY event_ts
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS final_status
FROM account_events;

Pre-aggregate before the join

  • Large fact-to-dimension joins get cheaper when you collapse the fact side first.
  • This is especially useful when downstream logic only needs grouped metrics, not raw rows.
WITH order_totals AS (
  SELECT
    customer_id,
    region_id,
    SUM(amount) AS gross
  FROM fact_orders
  WHERE order_date >= @start_date
  GROUP BY customer_id, region_id
)
SELECT
  ot.customer_id,
  dr.region_name,
  ot.gross
FROM order_totals ot
JOIN dim_region dr
  ON dr.region_id = ot.region_id;

Inline or materialize a CTE deliberately

  • Readable WITH clauses do not guarantee reuse.
  • In BigQuery documentation, WITH is described as a readability feature, not a performance guarantee.
  • In PostgreSQL, eligible WITH queries can be inlined, and NOT MATERIALIZED can force that behavior.
WITH filtered AS NOT MATERIALIZED (
  SELECT
    user_id,
    event_date,
    revenue
  FROM fact_events
  WHERE event_date >= @start_date
)
SELECT
  user_id,
  SUM(revenue) AS total_revenue
FROM filtered
GROUP BY user_id;
CREATE TEMP TABLE heavy_stage AS
SELECT
  customer_id,
  region_id,
  SUM(amount) AS gross
FROM fact_orders
WHERE order_date >= @start_date
GROUP BY customer_id, region_id;

SELECT hs.customer_id, dr.region_name, hs.gross
FROM heavy_stage hs
JOIN dim_region dr
  ON dr.region_id = hs.region_id;

Configuration

Planner-friendly defaults

  • Align physical partitioning or clustering with the columns used in your largest filters.
  • Keep window partitions narrow where possible; wide partitions multiply sort pressure.
  • Prefer stable sort keys with good cardinality for ranking and dedupe patterns.
  • Persist expensive transformations if the same shaped data feeds many dashboards or jobs.
  • Keep SQL readable; if you share snippets publicly, clean them with the Code Formatter after the rewrite is stable.

What to inspect in the plan

Signal What it usually means Rewrite first
Large sort stage Window or final ordering is wider than it needs to be Project fewer columns and reduce partition size
Repeated scan of same branch CTE or subquery is being reevaluated Materialize once or inline to enable pushdown
Row explosion after join Join occurs before dedupe or aggregation Pre-aggregate, filter earlier, or rewrite self-join
Spill to disk Sort or hash state exceeds memory budget Shrink intermediate rows and split the workload
Watch out: A readable CTE chain can hide the real cost center. The plan, not the indentation, tells you whether the branch was pushed down, scanned twice, or sorted too early.

Advanced Usage

Use QUALIFY to keep ranking local

  • When your engine supports it, QUALIFY filters window results after the window step without forcing an extra wrapper query.
  • It is especially useful for latest-row, top-N, and dedupe patterns.

Trade exact ranking for approximate bucketing when scale demands it

  • BigQuery documents that heavy NTILE usage can require a global ORDER BY and may hit resource limits.
  • For percentile-style grouping at massive scale, approximate quantiles can be the cheaper path if exact bucket boundaries are not mandatory.
SELECT
  APPROX_QUANTILES(latency_ms, 100) AS latency_percentiles
FROM api_requests;

Split logic when the optimizer cannot

  • If one monster query both transforms and serves analytics, separate the expensive stage from the final read path.
  • Temp tables, scheduled summary tables, and materialized views are often easier to operate than one heroic statement.
Pro tip: Treat every rewrite as a hypothesis. Measure bytes scanned, rows shuffled, spill, and elapsed time before you decide the query is truly faster.

Plan Checks

Fast verification loop

  1. Capture the baseline plan and runtime metrics.
  2. Rewrite one pattern at a time: ranking, frame, join order, or CTE reuse.
  3. Re-run with the same date range and concurrency conditions.
  4. Compare scan bytes, sort stages, spill, and output row counts.
  5. Keep the simpler query only if the plan is measurably better.
EXPLAIN ANALYZE
SELECT
  customer_id,
  SUM(amount) AS gross
FROM fact_orders
WHERE order_date >= @start_date
GROUP BY customer_id;
  • For shared benchmarks, keep the same predicates, same warehouse or slot conditions, and same output shape.
  • If the rewrite changes semantics, stop calling it an optimization and document it as a behavior change.

Frequently Asked Questions

How do I make window functions faster on large analytics tables? +
Start by reducing the number of rows and columns that reach the window step. Push filters earlier, pre-aggregate when raw detail is not needed, and make the PARTITION BY and ORDER BY keys as narrow and selective as possible.
Are CTEs always materialized in modern SQL engines? +
No. Engine behavior differs, and many optimizers treat WITH clauses as a readability feature rather than a guaranteed cache. If a heavy CTE is referenced multiple times, verify the plan and consider a temp table or an engine-specific materialization hint.
Why does LAST_VALUE return the wrong-looking result? +
The issue is usually the window frame, not the function itself. Add an explicit ROWS frame, often ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, when you want the final row from the whole partition.
When should I replace a self-join with LAG or LEAD? +
Replace it when you are comparing each row to a prior or next row in the same business key. LAG and LEAD preserve row context without the join fan-out that often makes self-joins expensive.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.