How to Implement Offline-First Sync Systems Like Notion

Team 5 min read

#offline-first

#webdev

#tutorial

#collaboration

Overview

Offline-first synchronization lets users continue working even when the network is unavailable, then seamlessly merge changes when connectivity returns. In a Notion-like editor, this means local edits, block-level updates, and distributed changes must converge without overwriting user intent. This post outlines a practical approach to implementing offline-first sync with strong conflict handling, robust data modeling, and a smooth user experience.

Core concepts

  • Local-first data store: All edits are captured locally before being synced, ensuring fast feedback and uninterrupted work.
  • Conflict-free data types: CRDTs (Conflict-Free Replicated Data Types) let multiple concurrent edits converge deterministically.
  • Append-only operation logs: Changes are recorded as ops or deltas with causal metadata to facilitate reconciliation.
  • Block-based document model: Notion-style editors organize content as blocks; syncing operates at the block or sub-block level to preserve structure.

Architecture patterns

  • Local database and change log: Use a local store (e.g., IndexedDB) to persist documents, blocks, and a history of operations.
  • Sync engine: A background service or worker handles outbound/inbound changes, retries, and backoff.
  • Server-side merge: A central service applies incoming changes, resolves conflicts using CRDTs or OT-like strategies, and broadcasts merged updates.
  • Data model separation: Distinguish document state (blocks, metadata) from synchronization state (sequence numbers, actor IDs).

Data model decisions

  • Block-based documents: Represent documents as a tree of blocks (paragraphs, headings, lists, media). Each block has a stable ID, parent linkage, and version vector.
  • Identifiers: Generate globally unique IDs for blocks and documents to avoid clashes during offline edits.
  • Versioning: Track causal metadata (actor ID, logical clock, timestamp) to order edits and detect conflicts.
  • Rich metadata: Attach provenance data (who edited what and when) to simplify UX for conflict resolution.

Conflict resolution strategies

  • CRDT-based convergence: Use a CRDT library to automatically merge concurrent edits at the block level. Prefer operations that are commutative and associative to minimize manual conflict handling.
  • Intent-preserving merges: When automatic merging is not possible, surface conflicts in the UI with clear options (keep local, accept remote, or merge manually).
  • Graceful degradation: If a conflict cannot be resolved automatically, present a conflict state that allows the user to pick the correct version for each conflicting block.

Synchronization protocol

  • Local change capture: Record edits as operations in a local log with metadata (author, timestamp, parent IDs).
  • Outbound sync: Push changes to the server when online; compress batched ops and attach a snapshot of the current document state.
  • Inbound sync: Receive merged changes from the server, apply them locally, and resolve any cross-site conflicts using the same CRDT rules.
  • Causal delivery: Ensure changes are applied in a causally consistent order to avoid misalignment of block relationships.
  • Connectivity handling: Implement exponential backoff, offline queueing, and reliable retries to tolerate intermittent networks.

Implementation stack (practical options)

  • Client: Framework-agnostic or Astro/React, with a UI that reflects real-time edits and conflict status.
  • Local storage: IndexedDB (for complex objects) or SQLite via a WASM layer if you need richer queries.
  • Sync transport: WebSocket for real-time updates or HTTP long-polling for server-triggered pushes; consider WebRTC for decentralized setups.
  • Synchronization core: CRDT libraries such as Y.js or Automerge for deterministic convergence.
  • Server: A merge service that applies incoming CRDT updates, stores document state, and broadcasts merged changes to all clients.
  • Security: Encrypt data at rest and in transit; manage keys to protect sensitive blocks or documents.

Example integration pattern

  • Use Y.js for CRDT merging and IndexedDB for local persistence.
  • Initialize a Y.Doc in each client, attach a WebSocket to exchange updates, and persist the document state to IndexedDB after every change.

Pseudo-workflow:

  • User edits a block -> Y.js applies local change and emits an update.
  • Update is serialized and saved to the local oplog in IndexedDB.
  • Change is sent to the server via WebSocket.
  • Server merges updates into a canonical state (using CRDT semantics) and broadcasts the merged doc.
  • Clients apply inbound updates to their local Y.Doc and persist to IndexedDB, maintaining causal order.

Code sketch (conceptual):

  • Initialize Y.js document and binding to UI
  • Persist doc state to IndexedDB on every change
  • Listen for remote updates and apply them to Y.Doc

Note: This is a high-level sketch. When implementing, tailor the APIs to your tech stack and ensure correct handling of block relationships and nested structures.

UI and UX considerations

  • Conflict indicators: Clearly flag blocks with unresolved conflicts and offer inline resolution actions.
  • Offline indicators: Show connectivity status and the pending change count.
  • Previews and snapshots: Let users preview how the merged result will look after sync, before applying it.
  • Performance feedback: Debounce updates to the UI during large edits to avoid jank.

Testing and observability

  • Network partition testing: Simulate long offline periods and flaky networks to verify reconciliation correctness.
  • Deterministic replays: Reproduce issues by replaying a known sequence of edits and verifying the final state matches expectations.
  • Metrics: Track time-to-merge, conflict rate, and divergence between clients.
  • Logging: Emit structured logs for local edits, outbound/inbound changes, and merge decisions to facilitate debugging.

Security and privacy considerations

  • Data at rest: Encrypt sensitive document blocks locally.
  • Data in transit: Use TLS for all sync traffic with integrity checks.
  • Access control: Enforce per-document permissions and block-level access rules.
  • Auditability: Preserve a tamper-evident log of edits to support traceability and rollback if needed.

Putting it together: a practical blueprint

  1. Model documents as block trees with stable IDs and causal metadata.
  2. Implement a local CRDT-based core to merge concurrent edits deterministically.
  3. Persist state and change logs in a local store (IndexedDB) with a clean API surface.
  4. Establish a reliable, authenticated sync channel to a central server that performs CRDT-based merges.
  5. Build UX around clear conflict resolution, offline status, and seamless re-sync.
  6. Add testing harnesses that simulate offline work, concurrent edits, and network failures.
  7. Prioritize security with encryption, access controls, and auditing.

Conclusion

Offline-first sync systems enable powerful, responsive collaboration akin to Notion by combining a block-based data model, CRDT-based convergence, and careful UX design. By structuring the architecture around local edits, deterministic merges, and robust synchronization pipelines, you can deliver a seamless experience that remains consistent and conflict-resilient across devices and networks.