Practical Database Indexing: Real Examples and Benchmarks

Team 9 min read

#databases

#indexing

#performance

#postgresql

#mysql

Indexing is the most cost-effective way to make database-backed applications fast. This article gives you practical examples and reproducible benchmarks for common query patterns, with a focus on PostgreSQL and notes for MySQL users.

What you’ll get:

A realistic dataset you can generate locally
Common query patterns and which indexes help
Before/after EXPLAIN ANALYZE with indicative timings
Guidance for Postgres-specific features (partial, covering, GIN) and MySQL equivalents
A checklist for measuring impact without fooling yourself

Your exact numbers will vary based on hardware, data distribution, cache warmth, and versions. The goal is to show proportions and methods, not absolute records.

Benchmarks setup

Database: PostgreSQL 16, default settings except increased shared_buffers (2–4 GB) and effective_cache_size (half RAM)
Hardware: 8 vCPU, 16 GB RAM, NVMe SSD
Data volume: 10 million rows
Methodology:
- Run EXPLAIN (ANALYZE, BUFFERS, TIMING, VERBOSE) three times and take the best warm-cache value
- Note a cold-cache run after dropping OS file cache when possible
- Ensure ANALYZE has run after bulk loads (VACUUM ANALYZE or just ANALYZE)
- Track write overhead via pg_stat_statements and insert throughput

Dataset: events A single wide table that captures common access patterns.

PostgreSQL DDL:

CREATE TABLE events (
  id           bigserial PRIMARY KEY,
  user_id      int NOT NULL,
  occurred_at  timestamptz NOT NULL,
  event_type   smallint NOT NULL,          -- e.g., 1=login, 2=purchase, ...
  country_code char(2) NOT NULL,
  payload      text,                        -- optional searchable text
  metadata     jsonb                        -- flexible attributes
);

-- Synthetic data: ~10M rows over ~365 days, skewed user distribution.
-- Adjust generate_series ranges to taste.
INSERT INTO events (user_id, occurred_at, event_type, country_code, payload, metadata)
SELECT
  (1000 + (random()*900000))::int,                                    -- many users, Zipf-ish spread
  now() - ((random()*365)::int || ' days')::interval
    - ((random()*86400)::int || ' seconds')::interval,
  (1 + (random()*5)::int)::smallint,
  (ARRAY['US','DE','IN','BR','GB','FR','CA','AU','ES','IT'])[1 + (random()*9)::int],
  substr(md5(random()::text), 1, 60),
  jsonb_build_object(
    'plan', (ARRAY['free','pro','team'])[1 + (random()*2)::int],
    'device', (ARRAY['ios','android','web'])[1 + (random()*2)::int],
    'ref', substr(md5(random()::text), 1, 8)
  )
FROM generate_series(1, 10000000);

ANALYZE events;

Query pattern 1: fetch latest items per user (ORDER BY + LIMIT) Typical UI: “Latest 10 user events”.

Query:

SELECT id, occurred_at, event_type, country_code
FROM events
WHERE user_id = 424242
ORDER BY occurred_at DESC
LIMIT 10;

Baseline (no index on user_id):

Plan: Seq Scan on events, filter user_id, then sort top-N
Indicative warm time on 10M rows: 800–1500 ms
Buffers: heavy heap read; sorts each time

Add a composite, descending, covering index:

-- Postgres: INCLUDE makes it covering without affecting sort keys
CREATE INDEX CONCURRENTLY idx_events_user_time_desc_cover
  ON events (user_id, occurred_at DESC)
  INCLUDE (event_type, country_code);

After index:

Plan: Index Only Scan using idx_events_user_time_desc_cover
Indicative warm time: 1–5 ms
Cold time: 20–60 ms (depending on cache)
Notes:
- DESC in the index matches ORDER BY DESC and avoids a sort
- INCLUDE enables index-only scans when visibility map is favorable

MySQL note:

Use a composite index: KEY (user_id, occurred_at DESC). InnoDB secondary indexes can be covering if selected columns are in the index. There’s no INCLUDE syntax; ensure selected columns are part of the index or the primary key.

Query pattern 2: range by time within a partitioning key Example: “Events in DE last 7 days ordered by time”.

SELECT id, occurred_at, user_id
FROM events
WHERE country_code = 'DE'
  AND occurred_at >= now() - interval '7 days'
ORDER BY occurred_at ASC
LIMIT 500;

Index:

CREATE INDEX CONCURRENTLY idx_events_country_time
  ON events (country_code, occurred_at);

Why this order:

Filter on country_code first, then range on occurred_at
Matches ORDER BY occurred_at for that country; often avoids explicit sort

After index:

Warm time: 3–15 ms for selective countries
If ‘DE’ is very common, consider a partial index by country, or add more selectivity (e.g., event_type).

MySQL note:

Same composite order: KEY(country_code, occurred_at)
Ensure the query can use the leftmost prefix.

Query pattern 3: partial indexes for hot subsets (Postgres) If most queries target event_type = 2 (e.g., purchases), use a partial index.

CREATE INDEX CONCURRENTLY idx_events_purchases_recent
  ON events (occurred_at DESC)
  WHERE event_type = 2;

Query:

SELECT id, user_id, occurred_at
FROM events
WHERE event_type = 2
  AND occurred_at >= now() - interval '30 days'
ORDER BY occurred_at DESC
LIMIT 50;

Benefits:

Smaller index → faster scans and updates
Warm time: similar to pattern 1, often 1–5 ms
MySQL: no predicate partial indexes; consider a separate table for hot subset, or a generated column and partitioning/secondary index tuned to the predicate.

Query pattern 4: case-insensitive prefix vs substring text search Case-insensitive prefix:

-- Functional index on expression
CREATE INDEX CONCURRENTLY idx_events_lower_payload_prefix
  ON events ((lower(payload)));

-- Query
SELECT id FROM events
WHERE lower(payload) LIKE 'abc%';

B-tree can handle prefix LIKE with the same collation.
Warm time: tens of ms depending on selectivity.

Substring search:

-- Postgres pg_trgm extension for trigram GIN
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX CONCURRENTLY idx_events_payload_trgm
  ON events USING GIN (payload gin_trgm_ops);

SELECT id FROM events
WHERE payload ILIKE '%abc%';

GIN trigram turns a table scan into an index scan
Warm time: often 10–40 ms for selective terms vs hundreds of ms to seconds without index

MySQL note:

Use FULLTEXT for natural-language modes; for substring matching you may need n-gram parsers or consider external search. For prefix LIKE, an index on payload with a consistent collation helps. MySQL 8 supports functional indexes via generated columns for lower(payload).

Query pattern 5: JSON attributes Filter by JSON attribute:

-- Query
SELECT id FROM events
WHERE metadata ->> 'plan' = 'pro';

Postgres indexes:

-- Option A: GIN on jsonb for containment lookups
CREATE INDEX CONCURRENTLY idx_events_metadata_gin
  ON events USING GIN (metadata jsonb_path_ops);

-- Then query with containment to use the index:
SELECT id FROM events
WHERE metadata @> '{"plan":"pro"}';

-- Option B: Functional B-tree if you only need a single key
CREATE INDEX CONCURRENTLY idx_events_plan
  ON events ((metadata ->> 'plan'));

B-tree functional index is small and fast for equality, often a few ms.
GIN helps for multiple keys and more complex predicates.

MySQL note:

Create a generated column and index it:

ALTER TABLE events
  ADD COLUMN plan VARCHAR(10) AS (JSON_UNQUOTE(JSON_EXTRACT(metadata, '$.plan'))) STORED,
  ADD INDEX idx_events_plan (plan);

Query pattern 6: joins on foreign keys If you frequently join events.user_id to users.id, index both sides:

users.id is PRIMARY KEY (already indexed)
events.user_id needs a B-tree index:

CREATE INDEX CONCURRENTLY idx_events_user ON events (user_id);

Without this, the join may devolve into nested loops with repeated table scans.

Benchmark snapshots (indicative)

Latest 10 events for a user:
- No index: 800–1500 ms warm
- With (user_id, occurred_at DESC) covering: 1–5 ms warm, 20–60 ms cold
Country + 7-day range:
- No index: 700–1200 ms warm with sort
- With (country_code, occurred_at): 3–15 ms warm, 25–80 ms cold
JSON plan equality via functional index:
- No index: 400–900 ms warm
- With (metadata->>‘plan’): 2–8 ms warm
Substring payload search:
- No index: 2–6 s warm for unpopular term on 10M rows
- GIN trigram: 10–40 ms warm for selective term

Write overhead Indexes accelerate reads but slow writes. Measure the impact:

Baseline INSERT INTO events … (no secondary indexes): ~170k rows/s bulk insert
With 3 additional indexes (user_id, user_id+occurred_at, country_code+occurred_at): ~75–100k rows/s
With GIN on payload or metadata: expect another noticeable drop due to GIN maintenance These are directional numbers; measure on your stack.

How to measure correctly

Warm vs cold cache:
- Warm: run the same query 2–3 times and take the best
- Cold: on Linux, drop caches between runs (needs root) or read enough unrelated data to evict
Use EXPLAIN (ANALYZE, BUFFERS):
- Look for Index Scan/Index Only Scan vs Seq Scan
- Check Rows Removed by Filter; if high, consider predicate pushdown or more selective indexes
Check stats:
- ANALYZE after big loads
- Ensure datatypes and collations match predicates
Watch for implicit casts:
- WHERE user_id = ‘42’ might prevent index usage; cast properly or use correct types
Verify sort avoidance:
- If you ORDER BY occurred_at but your index is (occurred_at, user_id), ensure the filter aligns to allow index-order scans

Selecting composite index order

Put the most selective, equality-tested column first
Follow with range/ordering columns
Make the index order match your ORDER BY whenever possible
One query pattern per index; if you have many patterns, consider query consolidation or different indexes per workload

Covering indexes

Postgres: use INCLUDE for non-key columns to enable index-only scans
MySQL/InnoDB: queries are often covered if selected columns are in the secondary index or the primary key
Be mindful: larger indexes increase write cost and memory use

Partial indexes (Postgres)

Great when a small predicate matches most hot queries
Example: WHERE event_type = 2 AND occurred_at >= now() - interval ‘90 days’
They reduce index size and maintenance cost

Functional indexes

Index expressions used in WHERE/ORDER BY:
- lower(email), date_trunc(‘day’, occurred_at), coalesce(field, ‘x’)
MySQL 8: use generated columns and index them

Full-text and fuzzy search

Postgres: GIN with pg_trgm for substring; GIN/GIST with to_tsvector for full-text
MySQL: FULLTEXT indexes on InnoDB for natural language and boolean mode
For complex search, consider dedicated search engines

Operational tips

Don’t over-index; every index slows writes and VACUUM
Rebuild bloated indexes if needed (REINDEX CONCURRENTLY in Postgres)
Monitor with pg_stat_io, pg_stat_statements, and auto_explain
Consider partitioning by time for very large tables and to improve pruning

Minimal reproducible script (Postgres)

Create and load:

-- Run DDL + INSERT from above
ANALYZE events;

Baseline query:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, occurred_at, event_type, country_code
FROM events
WHERE user_id = 424242
ORDER BY occurred_at DESC
LIMIT 10;

Add index and re-run:

CREATE INDEX CONCURRENTLY idx_events_user_time_desc_cover
  ON events (user_id, occurred_at DESC)
  INCLUDE (event_type, country_code);

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, occurred_at, event_type, country_code
FROM events
WHERE user_id = 424242
ORDER BY occurred_at DESC
LIMIT 10;

Cheat sheet

Equality filter: B-tree on column; for multi-column, equality columns first
Range filter with sort: composite (filter_cols, order_col [ASC|DESC]) to avoid sort
Latest N rows per key: (key, time DESC) plus covering columns
Frequent hot subset in Postgres: partial index
Expression predicates: functional or generated-column index
Substring search: trigram GIN (Postgres) or rethink approach in MySQL
JSON attribute equality: functional index (or generated column in MySQL)
Always measure with EXPLAIN ANALYZE, warm and cold

With a few targeted indexes and a disciplined measurement loop, you can turn multi-second scans into millisecond responses and keep write overhead under control.

Share this article

Share on Twitter Share on LinkedIn