How PostgreSQL Change Data Capture (CDC) Works
Learn how PostgreSQL CDC captures row-level changes using WAL, logical replication, and tools like Debezium to stream database events in real time.
How PostgreSQL Change Data Capture (CDC) Works
Change Data Capture (CDC) is a technique for tracking row-level changes (INSERT, UPDATE, DELETE) in a database and streaming them to downstream consumers in real time.
Why CDC?
Traditional approaches (polling, dual writes) have significant drawbacks:
| Approach | Problem |
|---|---|
| Polling | Missed deletes, high DB load, latency |
| Dual writes | No atomicity — app writes to DB and message broker separately, risking inconsistency |
| Triggers | Tight coupling, performance overhead on write path |
| CDC (WAL-based) | Zero overhead on write path, captures all changes, exactly-once possible |
PostgreSQL WAL — The Foundation
PostgreSQL already writes every change to the Write-Ahead Log (WAL) for crash recovery. CDC simply reads this log.
Rendering diagram…
Logical Decoding Pipeline
PostgreSQL's logical decoding transforms binary WAL records into a structured change stream:
- WAL Writer — PostgreSQL writes binary WAL records during transactions.
- Replication Slot — Bookmarks the consumer's position; prevents WAL from being reclaimed prematurely.
- Output Plugin (e.g.,
pgoutput,wal2json) — Decodes binary WAL into structured change events. - CDC Connector (e.g., Debezium) — Reads decoded changes and publishes to Kafka / other sinks.
Rendering diagram…
Setting Up CDC in PostgreSQL
1. Enable logical replication
-- postgresql.conf
wal_level = logical
max_replication_slots = 4
max_wal_senders = 4
2. Create a replication slot
SELECT pg_create_logical_replication_slot('debezium_slot', 'pgoutput');
3. Create a publication
CREATE PUBLICATION my_pub FOR TABLE orders, customers;
4. Connect Debezium
Debezium reads from the slot and publishes each change as a Kafka message with:
- Key: primary key of the changed row
- Value: before/after images of the row + metadata (LSN, timestamp, transaction ID)
Key Considerations
- Slot monitoring — unused slots prevent WAL cleanup; disk can fill up.
- Schema changes — DDL isn't captured by default; Debezium handles this with schema history topics.
- Large transactions — very large transactions produce large WAL; consider
wal_sender_timeouttuning. - Exactly-once semantics — achievable with idempotent consumers or Kafka transactions.