GraphQL Performance Tricks: Caching, Batching, and Persisted Queries

Team 6 min read

#graphql

#performance

#caching

#batching

#persisted-queries

#webdev

Introduction

GraphQL offers a powerful way to fetch exactly what you need, but it can become a performance pitfall if you don’t manage how data is retrieved and transmitted. This post covers three practical tricks to boost GraphQL performance: client and server caching, request batching, and persisted queries. Implementing these strategies thoughtfully can reduce latency, lower bandwidth, and improve the scalability of your GraphQL API.

Caching GraphQL responses

Caching helps avoid recomputing or refetching data that hasn’t changed. There are two main caching layers to consider: client-side caching and server-side caching.

Client-side caching
- Use a normalized cache (e.g., InMemoryCache in Apollo Client) to store entities by their IDs and reconstruct results efficiently.
- Define type policies to customize cache keys and merge strategies for lists.
- Choose appropriate fetchPolicy per operation (e.g., cache-first for read-heavy UI, network-only for fresh data).
- Example (Apollo Client): ative code snippet import { ApolloClient, InMemoryCache, HttpLink } from ‘@apollo/client’; import fetch from ‘cross-fetch’;
const cache = new InMemoryCache({ typePolicies: { Query: { fields: { product: { // cache products by id, ignoring other args to keep a single source of truth keyArgs: [‘id’], }, userPosts: { keyArgs: false, merge(existing = [], incoming) { return […existing, …incoming]; }, }, }, }, }, });

const client = new ApolloClient({ link: new HttpLink({ uri: ‘/graphql’, fetch }), cache, defaultOptions: { watchQuery: { fetchPolicy: ‘cache-first’ }, query: { fetchPolicy: ‘network-only’ }, }, }); end
- Caching helpers like field-level TTLs, cache invalidation, and versioning help ensure data stays fresh when updates occur.
Server-side caching
- Cache expensive resolvers or query results in a data layer (Redis, Memcached) or within the GraphQL server’s own cache.
- Use TTLs and explicit invalidation when mutations occur.
- Consider per-user or per-query caching for personalized data, with careful invalidation logic.
Cache invalidation and freshness
- Tie cache invalidation to mutations or data-change events.
- Use short TTLs for rapidly changing data and longer TTLs for stable data.
- Consider cache-busting keys that include a version or timestamp.
Practical tips
- Prefer cache-first or cache-and-network strategies for a responsive UI.
- Keep cache size in check; a bloated cache can degrade performance.
- Monitor cache hit rates and adjust policies accordingly.

Batching GraphQL requests

Batching combines multiple GraphQL operations into a single HTTP request, reducing the per-request overhead and network chatter. This is especially beneficial for UIs that trigger many small queries in quick succession.

How batching works
- Client-side: collect several operations into a batch array and send them together.
- Server-side: receive an array of GraphQL payloads and return an array of responses, maintaining the order of operations.
Client setup (Apollo)
- Use a batch link to group operations: ative code snippet import { BatchHttpLink } from ‘@apollo/client/link/batch-http’; import { ApolloClient, InMemoryCache } from ‘@apollo/client’;
const batchLink = new BatchHttpLink({ uri: ‘/graphql’, batchMax: 5, // maximum operations per batch batchInterval: 20, // wait time in ms to form a batch });

const client = new ApolloClient({ link: batchLink, cache: new InMemoryCache(), }); end
- Tips:
  - Set batchMax to a value that your server and network can handle comfortably.
  - Use batchInterval to balance latency vs. payload size.
  - Ensure your GraphQL server supports batched requests and returns a matching array of responses.
Server considerations
- Ensure the server can parse batched payloads and that each operation is executed independently.
- Be mindful of error handling: if one operation fails, others in the batch should still succeed.
- Avoid mixing mutations with queries in the same batch if your server’s batching implementation has limitations.
Tradeoffs
- Increased response size per batch can delay the first response; tune batchInterval to minimize user-perceived latency.
- Error isolation can be more complex when multiple operations share a single HTTP response.

Persisted queries

Persisted queries reduce payload size and improve security by sending only a query identifier (hash) instead of the full query text. The server maintains a registry of approved queries, and clients first fetch or register the query, then reuse its identifier for subsequent requests.

How persisted queries work
- Client registers a query and obtains a hash (e.g., SHA-256).
- Client sends only the hash in subsequent requests.
- If the server doesn’t recognize the hash, it returns a not-found response, and the client may retry with the full query to register it.
Client setup (Apollo with persisted queries)
- Use a persisted query link to send hashes: ative code snippet import { createPersistedQueryLink } from ‘apollo-link-persisted-queries’; import { HttpLink } from ‘apollo-link-http’; import { ApolloClient, InMemoryCache } from ‘@apollo/client’;
const httpLink = new HttpLink({ uri: ‘/graphql’ }); const link = createPersistedQueryLink().concat(httpLink);

const client = new ApolloClient({ link, cache: new InMemoryCache(), }); end
- Alternative modern approach (if supported by your stack) may integrate persisted queries directly into the Apollo Link chain.
Server setup
- Enable persisted queries on the GraphQL server:
- For Apollo Server, you can configure a persisted-queries cache (LRU-based) to store query hashes: ative code snippet const { ApolloServer } = require(‘@apollo/server’); const { InMemoryLRUCache } = require(‘@apollo/utils.keyframes’); // or an equivalent cache
const server = new ApolloServer({ typeDefs, resolvers, persistedQueries: { cache: new InMemoryLRUCache({ maxSize: 1000 }), }, }); end
- When a query hash is not found, the server responds with a persistedQueryNotFound error, prompting the client to retry with the full query to register it.
When to use persisted queries
- Beneficial for mobile apps or bandwidth-constrained environments.
- Great for services with a stable set of queries and low mutation frequency.
- Helpful to improve security by limiting raw query exposure over the network.
Pitfalls and best practices
- Ensure cache coherence: corrupted or out-of-date persisted queries can lead to failures.
- Maintain a predictable registry size and implement cleanup for stale queries.
- Combine with proper error fallback: if a persisted query is not found, fall back to sending the full query once to register it.

Putting it together: a practical setup

Objective: a GraphQL client with caching, batching, and persisted queries enabled, plus a server configured for APQ and sensible caching.
Client (conceptual)
- Enable InMemoryCache with thoughtful type policies.
- Use BatchHttpLink for batching.
- Use a persisted query link to leverage APQ when available.
Server
- Enable a persisted queries cache (e.g., in-memory or Redis) with a reasonable eviction policy.
- Add optional memoization or a resolver-level cache for expensive fields.
Metrics to watch
- Cache hit rate, average latency, batch response time, and persisted query registry hit/miss rates.

Pitfalls and debugging tips

Do not over-couple policies:
- Too aggressive caching can serve stale data; always tie invalidation to mutations.
Test batched requests thoroughly:
- Ensure error handling isolates failures to individual operations.
Validate persisted queries:
- Start with a small set of key queries and monitor for not-found responses to adjust registration flow.
Use tracing and metrics:
- Enable distributed tracing (e.g., Apollo Studio, OpenTelemetry) to observe cache, batch, and APQ behavior across services.

Conclusion

Caching, batching, and persisted queries are complementary techniques for improving GraphQL performance. Client-side caching reduces redundant data transfer, batching lowers HTTP overhead and latency for multiple operations, and persisted queries minimize payloads while constraining the data surface. When combined thoughtfully and with proper invalidation and server support, these tricks can yield noticeable gains in both speed and scalability for GraphQL-powered apps.

Share this article

Share on Twitter Share on LinkedIn