Query Optimization in PostgreSQL with EXPLAIN ANALYZE

Team 9 min read

#postgresql

#performance

#query-plans

#tutorial

Query Optimization in PostgreSQL with EXPLAIN ANALYZE

If a query is slow in PostgreSQL, the most reliable way to fix it is to understand its execution plan. EXPLAIN shows you the plan PostgreSQL intends to use; EXPLAIN ANALYZE runs the query and reports what actually happened. This article walks you through reading plans, spotting common bottlenecks, and applying practical fixes.

EXPLAIN vs EXPLAIN ANALYZE

EXPLAIN: Displays the planned execution strategy without running the query. Fast, safe, but theoretical.
EXPLAIN ANALYZE: Runs the query and appends runtime metrics (actual time, rows, loops). Accurate, but executes the query. Use on test data or inside a transaction you can roll back for mutating statements.

Useful options:

ANALYZE: Execute the query and measure.
BUFFERS: Show buffer hits and reads; great for I/O diagnosis.
WAL: Show WAL generated (useful for writes).
TIMING and SUMMARY: Detailed timings and totals.
SETTINGS: Show relevant GUCs used.
FORMAT JSON: Machine-readable output for visualizers.

Example:

EXPLAIN (ANALYZE, BUFFERS, WAL, SUMMARY)
SELECT ...

A reproducible workflow

Identify slow queries
- Enable pg_stat_statements and find top queries by total_time or mean_time.
- Optionally enable auto_explain to log plans for slow statements.
Reproduce with realistic parameters
- Use the same filters and joins your application uses.
- Avoid LIMIT 10 test cases if production typically returns thousands of rows.
Get a measured plan
- Run ANALYZE on involved tables to refresh stats.
- Run EXPLAIN (ANALYZE, BUFFERS, SUMMARY).
Interpret and iterate
- Compare estimated vs actual rows per node.
- Fix stats, indexing, or query shape; re-run and verify.

Reading a plan: the essentials

Each node in a plan reports:

cost=startup..total: Planner’s abstract cost. Lower is better, but this is not time.
rows: Estimated rows output by that node.
width: Estimated average row size in bytes.
actual time=first..last rows=n loops=m: Measured timings.
Buffers: shared/local blocks hit and read; high reads = I/O heavy.
Parallel: workers planned/used, leader participation.
Sort Method: In-memory vs external (disk).
JIT: Compilation overhead and gain; helpful for large scans/aggregations.

Key node types:

Seq Scan: Full table scan. Fine for tiny tables or very broad filters; often a red flag otherwise.
Index Scan / Index Only Scan: Uses a btree or other index; the latter can avoid table hits if visibility map allows.
Bitmap Index/Heap Scan: Efficient for many scattered matches.
Nested Loop, Hash Join, Merge Join: Different join strategies; choice depends on row counts, ordering, and memory.

Rule of thumb: Large mismatches between estimated rows and actual rows are the root cause of many bad plans.

Example: filtering, joining, ordering, and LIMIT

Suppose:

-- Hypothetical schema
CREATE TABLE customers (
  id bigint PRIMARY KEY,
  email text NOT NULL,
  status text NOT NULL
);

CREATE TABLE orders (
  id bigint PRIMARY KEY,
  customer_id bigint NOT NULL REFERENCES customers(id),
  created_at timestamptz NOT NULL,
  total_cents integer NOT NULL
);

CREATE INDEX ON orders (created_at);

Query:

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.created_at, c.email
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at >= now() - interval '30 days'
  AND c.status = 'active'
ORDER BY o.created_at DESC
LIMIT 50;

A plausible plan:

Limit
  ->  Sort  (cost=... rows=... width=...) (actual time=18.351..18.369 rows=50 loops=1)
        Sort Key: o.created_at DESC
        Sort Method: top-N heapsort  Memory: 38kB
        Buffers: shared hit=12045 read=512
        ->  Nested Loop  (cost=... rows=... width=...) (actual time=0.120..17.900 rows=8642 loops=1)
              ->  Index Scan using orders_created_at_idx on orders o
                    Index Cond: (created_at >= now() - '30 days'::interval)
                    Buffers: shared hit=7023 read=480
              ->  Index Scan using customers_pkey on customers c
                    Index Cond: (id = o.customer_id)
                    Filter: (status = 'active')
                    Rows Removed by Filter: 1
                    Buffers: shared hit=5022 read=32
Planning Time: 0.451 ms
Execution Time: 18.420 ms

Observations and improvements:

The Sort is doing a top-N heapsort. If you frequently need the latest rows, an index that matches ORDER BY helps avoid sorting:
```
CREATE INDEX ON orders (created_at DESC) INCLUDE (customer_id);
```
With this, PostgreSQL can do a backward index scan and stop after 50 rows, eliminating both extra reads and sort work.
If many customers are inactive, the join wastes time fetching then filtering. A partial index can speed the filter on customers:
```
CREATE INDEX ON customers (id) WHERE status = 'active'; -- often unnecessary if id is PK
CREATE INDEX ON customers (status, id); -- if you filter by status frequently
```
Typically the PK on customers(id) already exists; the main gain comes from making the orders index support the ordering.
If Sort shows “external merge Disk: …”, it spilled to disk. You can:
- Add or adjust the index to avoid the sort.
- Increase work_mem for this session or transaction:
```
SET LOCAL work_mem = '256MB';
```
  Note: work_mem applies per sort/hash operation per parallel worker. Use judiciously.

Common bottlenecks and how to fix them

Sequential scans due to missing or unusable indexes

Symptom: Seq Scan with large Buffers read, slow runtime.

Fix:

Create a btree index that matches your WHERE conditions’ leading columns:
```
CREATE INDEX ON events (tenant_id, created_at);
```

For expressions, use expression indexes:

CREATE INDEX ON users ((lower(email)));
-- Query must use the same expression:
SELECT ... WHERE lower(email) = 'a@b.com';

For array membership or full-text-like search, use GIN:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON products USING gin (name gin_trgm_ops);
SELECT ... WHERE name ILIKE '%sneaker%';

Bad cardinality estimates

Symptom: actual rows differ from estimated rows by orders of magnitude, leading to wrong join choice or join order.

Fix:

Refresh stats:
```
ANALYZE your_table;
```

Increase stats target for skewed columns:

ALTER TABLE your_table ALTER COLUMN status SET STATISTICS 1000;
ANALYZE your_table;

Use extended statistics for correlated columns (PostgreSQL 10+):

CREATE STATISTICS users_region_stats (dependencies, ndistinct) 
  ON country, state FROM users;
ANALYZE users;

Inefficient join strategies

Symptom: Nested Loop with huge inner loops or repeated scans; or Hash Join with hash table spill.
Fix:
- Ensure indexes on join keys of the probed side:
```
CREATE INDEX ON line_items (order_id);
```
- Allow Hash Join to succeed in memory by increasing work_mem when appropriate.
- Rewrite filters to reduce early rows before joining.
- Verify that predicates are sargable (avoid wrapping join keys in functions that prevent index usage).

Sorting and grouping

Symptom: Sort or GroupAggregate spills to disk; long runtimes.
Fix:
- Add a supporting index for ORDER BY or GROUP BY prefix.
- Increase work_mem for large sorts/aggregations.
- Consider pre-aggregation or materialized summaries if appropriate.

CTEs and materialization

PostgreSQL 12+ inlines CTEs by default. If you forced MATERIALIZED, you might be blocking the planner.
Use NOT MATERIALIZED or inline the CTE if you want the planner to reorder joins and push predicates:
```
WITH t AS NOT MATERIALIZED (SELECT ... )
SELECT ... FROM t JOIN ...
```

OR conditions and unions

OR with disparate columns can defeat index usage.
Options:
- Multi-column indexes when the same leading column is used.
- Use UNION ALL to split predicates so each branch uses an index.
- Consider GIN for JSONB containment or arrays.

Pagination

OFFSET/LIMIT gets slower as OFFSET grows because rows are still scanned and discarded.

Prefer keyset pagination:

SELECT ... 
FROM posts 
WHERE (created_at, id) < ($cursor_created_at, $cursor_id)
ORDER BY created_at DESC, id DESC
LIMIT 50;

Add a matching index:

CREATE INDEX ON posts (created_at DESC, id DESC);

Index-only scans and visibility

If a query reads only indexed columns, PostgreSQL can use Index Only Scan if the visibility map is set for pages.
Keep autovacuum running so visibility map remains accurate; consider VACUUM if needed.

Write-heavy hotspots

For INSERT/UPDATE/DELETE, EXPLAIN (ANALYZE, WAL, BUFFERS) reveals WAL and buffer churn.
Reduce unnecessary indexes on hot tables; each index adds write cost.
Use appropriate fillfactor to reduce page splits on frequently updated tables.

Diagnosing with BUFFERS and I/O

Buffers: shared hit vs read helps separate CPU-bound from I/O-bound queries. High read means data was not in cache.
track_io_timing = on lets you see time spent reading vs writing.
For merges and sorts, “external” indicates disk spill; increase work_mem or change plan shape.

JIT, parallelism, and config nudges

JIT can help CPU-bound large scans/aggregations; it adds overhead for small queries. Tune jit_above_cost and relatives if needed.
Parallelism helps large scans; look for “Parallel Seq Scan” or “Gather”. Ensure max_parallel_workers_per_gather > 0 and queries are large enough to benefit.
Cost parameters matter:
- On SSDs, random_page_cost close to seq_page_cost (e.g., 1.1 vs 1.0) often yields better index usage.
- effective_cache_size should reflect OS cache to inform planner about cached data.
Avoid forcing enable_seqscan/enable_nestloop off globally; use only for experiments.

Case study: turning a bad Nested Loop into a fast Hash Join

Before:

Nested Loop
  -> Seq Scan on orders  (actual rows=500000)
  -> Index Scan on line_items (order_id = orders.id) (actual rows=10 per loop)
Execution Time: 12.5 s

Issues:

Massive outer scan; repeated inner lookups.

Fix options:

Push selective filter into orders, or add an index to make orders selective:
```
CREATE INDEX ON orders (status, created_at);
```

Or allow a Hash Join by filtering early and giving enough work_mem:

SET LOCAL work_mem = '256MB';
SELECT ... FROM orders o
JOIN line_items li ON li.order_id = o.id
WHERE o.status = 'completed' AND o.created_at >= now() - interval '7 days';

After:

Hash Join
  -> Seq Scan on orders (actual rows=18000)
  -> Hash on line_items (actual rows=180000)
Execution Time: 320 ms

Statistics: when the planner guesses wrong

If you see estimates off by 10x or more:

Run ANALYZE on the tables.
Raise statistics target for highly skewed columns.
Add extended stats for correlated predicates.
Avoid non-sargable predicates that hide selectivity (e.g., WHERE date(created_at) = …; prefer range predicates WHERE created_at >= … AND created_at < …).

Tools that help

pg_stat_statements: find the real top offenders.
auto_explain: capture plans for slow queries in logs.
Visualizers: explain.depesz.com and pev2 (explain.dalibo.com) accept JSON plans.
pgBadger: analyze Postgres logs for trends.
pg_hint_plan exists but prefer proper statistics and indexing first.

A quick checklist

Is there a supporting index for the WHERE and ORDER BY?
Are estimated rows close to actual rows on major nodes?
Are sorts or hashes spilling to disk?
Are joins using appropriate methods, and do join keys have indexes?
Did you ANALYZE recently and, if needed, increase statistics target or add extended stats?
Is the query shape sargable, avoiding function-wrapped columns and enabling index usage?
Can you reduce scanned rows earlier or switch to keyset pagination?

Conclusion

EXPLAIN ANALYZE turns guesswork into data. By reading plans, validating estimates, and aligning indexes and query shapes with what the planner expects, you can eliminate the vast majority of performance problems in PostgreSQL. Start with accurate statistics and the right indexes, confirm with EXPLAIN (ANALYZE, BUFFERS), and iterate until estimates and reality match.

Share this article

Share on Twitter Share on LinkedIn