Building an Open API with Clear Rate Limits and Docs

Team 5 min read

#api-design

#openapi

#documentation

#rate-limiting

Introduction

Open APIs are powerful when they are both easy to use and reliably protected. Clear rate limits prevent abuse, protect your infrastructure, and give developers predictable boundaries. Coupled with comprehensive, generator-friendly docs, an OpenAPI-first approach helps teams design, implement, and evolve an API that scales with confidence.

In this post, you’ll learn how to design an OpenAPI specification that communicates rate limits clearly, implement robust rate limiting at runtime, and document limits so developers know exactly what to expect from each endpoint.

Principles of OpenAPI and Rate Limits

Treat the OpenAPI spec as the contract for what you expose and how to use it, including performance expectations.
Define rate limits as runtime guarantees, not just as a policy in a README. Document them where clients look: in the API docs and in the API responses.
Use standard HTTP semantics: 429 Too Many Requests for exhausted limits, with headers that expose the current state (limit, remaining, reset time).
Keep per-endpoint and per-key (or per-app) limits configurable to support different plans or tiers without touching code.

Rate-Limiting Strategies

Fixed Window vs Sliding Window
- Fixed Window: Simple, per-window counters (e.g., per minute). Easy to implement but can produce bursts at window boundaries.
- Sliding Window: More even distribution by considering recent requests within a rolling window.
Token Bucket / Leaky Bucket
- Token Bucket: Allows bursts up to a burst capacity, then refills tokens at a steady rate.
- Leaky Bucket: Enforces a steady outflow rate, smoothing traffic.
Per-key vs Per-IP
- Per-key (e.g., API key or OAuth token): Aligns with access plans and paid tiers.
- Per-IP: Useful for protecting public endpoints, often combined with per-key limits for authenticated clients.
Hard vs Soft Limits
- Hard limit: Enforced never to exceed; may block requests immediately.
- Soft limit with a grace period: Warns or allows slight overages before applying a hard block, useful for legitimate bursts.
Global vs Endpoint-specific
- Global: A single cap across all endpoints.
- Endpoint-specific: Different limits per resource, reflecting varying load or business importance.

Tips:

Prefer per-key limits with a clear plan structure to balance security and developer experience.
Provide a clear Retry-After value and a predictable reset moment to help clients recover gracefully.

Documenting Rate Limits in OpenAPI

Documenting limits in OpenAPI helps consumers understand what to expect without digging into external docs. Use standard responses and headers to convey limits and failures.

Code example (OpenAPI YAML excerpt):

paths:
  /items:
    get:
      summary: Retrieve items
      responses:
        '200':
          description: OK
          headers:
            X-RateLimit-Limit:
              description: Maximum requests allowed in the current window
              schema:
                type: integer
            X-RateLimit-Remaining:
              description: Remaining requests in the current window
              schema:
                type: integer
            X-RateLimit-Reset:
              description: UTC epoch seconds when the current window resets
              schema:
                type: integer
        '429':
          description: Too Many Requests
          headers:
            Retry-After:
              description: Seconds to wait before retrying
              schema:
                type: integer

Guidelines:

Include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset in successful responses where applicable.
Document 429 responses with a Retry-After header to help clients back off.
Consider a small section in your docs that maps plan tiers to per-key limits and how to request higher limits.

Implementing Rate Limiting in Your API

Choose a backend and a strategy that matches your scale and tech stack. Here are simple examples to illustrate approaches.

Node.js with Express (in-memory limiter; for production use Redis or a distributed store):

const express = require('express');
const rateLimit = require('express-rate-limit');

const app = express();

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100,                 // limit each IP to 100 requests per window
  standardHeaders: true,     // expose rate limit headers: X-RateLimit-*
  legacyHeaders: false
});

app.use('/api/', limiter);

app.get('/api/items', (req, res) => {
  res.json({ items: [] });
});

app.listen(3000, () => console.log('API listening on port 3000'));

Notes:

For multi-instance environments, store counters in Redis or a database to keep limits consistent across instances.
Tailor limits per API key or user role by swapping the limiter’s key function or applying per-route middlewares.

Alternative approaches:

Python FastAPI with slowapi or aioredis-backed rate limits for asynchronous apps.
NGINX or API gateways for centralized rate limiting at the edge.

Generating and Hosting Docs

Maintain a single OpenAPI spec (YAML or JSON) that reflects your rate limit semantics.
Use a docs UI that pulls from the spec:
- Swagger UI for interactive exploration
- Redoc for clean, production-like docs
Automate doc generation as part of your CI/CD to ensure docs stay in sync with code.
Consider hosting docs alongside your API, or as a separate static site with a versioned spec per release.

Best practices:

Version your OpenAPI spec and tag changes that affect rate limits.
Provide a clear changelog entry when limits are adjusted.
Include examples showing successful requests and 429 responses with Retry-After guidance.

Testing, Observability, and Deploy

Test rate limits under load with both unit tests and integration tests. Validate:
- Correct rate limit counters per key or IP
- Proper 429 responses with Retry-After headers
- Header values in both success and error responses
Observability:
- Emit metrics for requests per key, per endpoint, and per plan
- Track 429 rate to identify potential abuse or supply-demand mismatches
- Alert on sustained spikes that could indicate misuse or misconfigurations
Deploy strategies:
- Start with a conservative default, then increase limits gradually based on observed traffic and capacity
- Provide a process for clients to request higher limits when needed

Conclusion

A well-designed OpenAPI, paired with thoughtful runtime rate limiting and clear, discoverable docs, creates a robust foundation for scalable APIs. By communicating limits in the spec, enforcing them consistently at runtime, and presenting intuitive documentation and examples, you empower developers to build on your API confidently while protecting your services from abuse.

Share this article

Share on Twitter Share on LinkedIn