How PostgreSQL Change Data Capture (CDC) Works

Learn how PostgreSQL CDC captures row-level changes using WAL, logical replication, and tools like Debezium to stream database events in real time.


How PostgreSQL Change Data Capture (CDC) Works

Change Data Capture (CDC) is a technique for tracking row-level changes (INSERT, UPDATE, DELETE) in a database and streaming them to downstream consumers in real time.


Why CDC?

Traditional approaches (polling, dual writes) have significant drawbacks:

ApproachProblem
PollingMissed deletes, high DB load, latency
Dual writesNo atomicity — app writes to DB and message broker separately, risking inconsistency
TriggersTight coupling, performance overhead on write path
CDC (WAL-based)Zero overhead on write path, captures all changes, exactly-once possible

PostgreSQL WAL — The Foundation

PostgreSQL already writes every change to the Write-Ahead Log (WAL) for crash recovery. CDC simply reads this log.

Rendering diagram…

Logical Decoding Pipeline

PostgreSQL's logical decoding transforms binary WAL records into a structured change stream:

  1. WAL Writer — PostgreSQL writes binary WAL records during transactions.
  2. Replication Slot — Bookmarks the consumer's position; prevents WAL from being reclaimed prematurely.
  3. Output Plugin (e.g., pgoutput, wal2json) — Decodes binary WAL into structured change events.
  4. CDC Connector (e.g., Debezium) — Reads decoded changes and publishes to Kafka / other sinks.
Rendering diagram…

Setting Up CDC in PostgreSQL

1. Enable logical replication

-- postgresql.conf
wal_level = logical
max_replication_slots = 4
max_wal_senders = 4

2. Create a replication slot

SELECT pg_create_logical_replication_slot('debezium_slot', 'pgoutput');

3. Create a publication

CREATE PUBLICATION my_pub FOR TABLE orders, customers;

4. Connect Debezium

Debezium reads from the slot and publishes each change as a Kafka message with:

  • Key: primary key of the changed row
  • Value: before/after images of the row + metadata (LSN, timestamp, transaction ID)

Key Considerations

  • Slot monitoring — unused slots prevent WAL cleanup; disk can fill up.
  • Schema changes — DDL isn't captured by default; Debezium handles this with schema history topics.
  • Large transactions — very large transactions produce large WAL; consider wal_sender_timeout tuning.
  • Exactly-once semantics — achievable with idempotent consumers or Kafka transactions.

Further Reading