Using PostgreSQL JSONB for Dynamic Schemas

Team 6 min read

#postgresql

#jsonb

#schema-design

#data-modeling

Dynamic product catalogs, event payloads, integrations, and user-defined settings often require a schema that evolves quickly. PostgreSQL’s JSONB type gives you the flexibility of schemaless documents with the safety, performance, and tooling of a relational database. This guide shows how to design, query, validate, and scale JSONB-backed models without painting yourself into a corner.

Why JSONB over JSON

  • JSONB is a binary format that supports efficient indexing and a rich set of operators. JSON stores text and is generally slower for querying.
  • JSONB removes duplicate keys and does not preserve key order.
  • JSONB is the default choice for production workloads where you need to query into the document.

Core modeling patterns

  1. Hybrid model (recommended for most apps) Keep stable, frequently-filtered fields as typed columns and put variable attributes in a jsonb column.
  • Pros: clear constraints on core data; flexible extensions; smaller JSONB values.
  • Cons: you still need migrations when core fields change.

Example:

create table products (
  id bigserial primary key,
  sku text not null unique,
  name text not null,
  price_cents integer not null check (price_cents >= 0),
  attrs jsonb not null default '{}'::jsonb,
  created_at timestamptz not null default now()
);
  1. Document-first model Store the entire object in one JSONB column when structure is highly volatile or you’re ingesting third-party payloads verbatim.
  • Pros: zero upfront schema cost.
  • Cons: weaker constraints; more expensive updates; harder analytics; more reliance on expression indexes.
  1. Multiple JSONB columns by concern Split attributes (e.g., public vs private, computed vs user-supplied) into separate JSONB columns for update locality and simpler permissions.

JSONB querying essentials

  • -> returns JSON; ->> returns text
  • #> and #>> traverse by path arrays
  • @> containment: left contains right
  • ? key existence; ?| any key; ?& all keys
  • jsonb_set and jsonb_insert to update
  • jsonb_path_exists/jsonb_path_query for JSONPath queries

Examples:

-- Read a nested attribute
select attrs -> 'dimensions' ->> 'width' as width
from products
where id = 42;

-- Find products with a given attribute
select id, name
from products
where attrs ? 'color';

-- Containment: products that have exactly this subset
select id
from products
where attrs @> '{"color": "red", "in_stock": true}';

-- JSONPath (complex predicates)
select id
from products
where jsonb_path_exists(attrs, '$.variants[*] ? (@.price > 100)');

Indexing strategies that actually work

  1. GIN index on the whole document
  • Good for containment (@>) and key existence (?) queries.
create index products_attrs_gin on products using gin (attrs);
-- Or smaller/faster for containment-only:
-- create index products_attrs_gin_path on products using gin (attrs jsonb_path_ops);

Notes:

  • jsonb_ops (default) supports more operators; jsonb_path_ops is smaller/faster but limited primarily to containment and existence.
  1. Expression indexes for hot fields
  • Equality lookups on a single attribute:
create index products_color_idx on products ((attrs ->> 'color'));
  • Case-insensitive lookup (if you use citext):
create extension if not exists citext;
create index products_color_citext_idx on products (( (attrs ->> 'color')::citext ));
  1. Partial indexes (sparse attributes)
  • Only index rows that have the key:
create index products_color_partial_idx
  on products ((attrs ->> 'color'))
  where attrs ? 'color';
  1. Generated columns + indexes (adds constraints and stats)
alter table products
add column color text generated always as (attrs ->> 'color') stored;

create index products_color_btree on products (color);
alter table products add constraint color_check check (color is null or color <> '');
  • Benefits: easy constraints; better planner statistics; simpler queries.

Validation without losing flexibility

  • Enforce basic shape with CHECK:
alter table products add constraint attrs_is_object
  check (jsonb_typeof(attrs) = 'object');

alter table products add constraint attrs_tags_is_array
  check (not (attrs ? 'tags') or jsonb_typeof(attrs->'tags') = 'array');
  • Guarantee required keys:
alter table products add constraint attrs_requires_color
  check (attrs ? 'color');
  • Type checks via generated columns:
alter table products
add column weight_grams integer
generated always as ((attrs->>'weight_grams')::integer) stored;

alter table products
add constraint weight_grams_nonneg check (weight_grams is null or weight_grams >= 0);
  • Full JSON Schema validation: Use a trigger with an extension like pg_jsonschema if you need rigorous schema validation at write time.

Updating JSONB safely

  • jsonb_set rewrites the entire JSONB value, so prefer small documents and hybrid models for frequently changing fields.
update products
set attrs = jsonb_set(attrs, '{dimensions,width}', to_jsonb(42), true)
where id = 42;

-- Remove a key
update products
set attrs = attrs - 'legacyField'
where id = 42;

Common query patterns and pitfalls

  • Missing vs null:
    • Key missing: not (attrs ? ‘foo’)
    • Key present but null: attrs -> ‘foo’ is null and attrs ? ‘foo’
  • Types must match for @>: the contained value’s type (text vs number) matters.
  • Avoid deep, highly nested documents; expression indexes on deep paths get brittle and slow.

Performance and storage considerations

  • JSONB values are TOASTed; updates rewrite the full value. If a few attributes change often, pull them out into dedicated columns.
  • Keep documents lean. Split unrelated concerns into separate JSONB columns or tables.
  • Analyze and index expressions you filter on. The planner has limited visibility into JSONB internals; generated columns improve statistics.
  • Use EXPLAIN ANALYZE to confirm index usage; prefer equality or containment queries that align with your indexes.
  • Vacuum and bloat: high-churn JSONB updates increase table and TOAST bloat. Consider autovacuum tuning and smaller JSONB chunks.
  • Batch writes to reduce WAL pressure. Consider logical decoding or CDC if downstream systems need dynamic payloads.

Migration patterns (when fields stabilize)

  1. Promote to a typed column
alter table products add column material text;
update products set material = attrs->>'material' where attrs ? 'material';
alter table products alter column material set not null;

-- Optional: drop from JSONB
update products set attrs = attrs - 'material';
  1. Backfill and tighten constraints
  • Create generated column, backfill if needed, add NOT NULL and unique constraints once data is clean.

Security and governance

  • Row-Level Security works the same; write policies using both relational columns and JSONB expressions.
  • Validate and sanitize user-supplied JSON. Avoid building dynamic SQL from JSON keys or values.
  • Log rejected writes with triggers to spot bad producers early.

When not to use JSONB

  • You need cross-row constraints, complex joins, or heavy analytics across attributes. Normalize instead.
  • You update large documents frequently. Prefer a normalized model or smaller JSONB shards.
  • You require strict referential integrity on dynamic fields. Consider mapping tables.

A practical checklist

  • Identify stable vs dynamic fields; put stable ones in columns.
  • Decide on JSONB keys, types, and naming conventions.
  • Add the minimum viable constraints (object type, required keys, basic types).
  • Create targeted indexes: GIN for containment, expression indexes for hot attributes, partial indexes for sparse data.
  • Use generated columns where you filter/aggregate often.
  • Monitor with EXPLAIN ANALYZE and pg_stat_statements; iterate on indexes.
  • Have a plan to promote frequently used JSONB fields to columns.

Conclusion JSONB lets you move quickly without giving up PostgreSQL’s strengths. Start hybrid, index the queries you actually run, validate the essentials, and keep a clear path to promote stabilized fields into first-class columns. Done well, JSONB provides the flexibility of schemaless documents with the reliability and performance of a mature relational database.