APACHE KAFKA

Distributed Event Streaming · Cheat Sheet
v3.x
Brokers · Topics · Consumers · Producers
CLI · Configs · Architecture
Message Retention
Configurable by time or size
N+1
Replication
ISR-based fault tolerance
O(1)
Read Complexity
Sequential disk access
ms
Latency
End-to-end at scale

Core Concepts

Fundamentals
🗂 Topic Core
A named, ordered, immutable log of records. Producers write to topics; consumers read from them. Topics are split into partitions for parallelism.
📦 Partition Scaling
Unit of parallelism within a topic. Messages in a partition are ordered and assigned a sequential offset. Partitions are distributed across brokers.
⬆ Producer Write
Publishes records to topics. Chooses partition via key hash, round-robin, or custom partitioner. Controls acks for durability guarantees.
⬇ Consumer Read
Reads records from partitions. Tracks position via committed offsets. Multiple consumers in a group divide partitions for parallel consumption.
👥 Consumer Group Group
Set of consumers sharing a group.id. Each partition is assigned to exactly one consumer. Enables horizontal scaling; groups are independent.
🖥 Broker Infra
A Kafka server that stores and serves partitions. Each broker handles a subset of topic partitions. One broker acts as controller in the cluster.
# Offset State
Integer position of a record in a partition (0-based). Consumers commit offsets to track progress. Can reset to earliest, latest, or specific position.
🔁 Replication HA
Each partition has a leader replica and N-1 followers. Producers/consumers interact with the leader. ISR (In-Sync Replicas) ensures data durability.
🗜 Log Compaction Retention
Retention policy that keeps only the latest value per key. Deletes older records with duplicate keys. Useful for changelog/event-sourcing patterns.
🏷 Record / Message Data
Unit of data: optional key, value bytes, timestamp, optional headers. Key determines partition assignment. Value is opaque bytes (Avro, JSON, etc.).

Data Flow

Architecture
Producers
Broker Cluster
Consumers
Partition Replication
Broker 1
P0
P1
P2
← Leader P0
Broker 2
P0
P1
P2
← Leader P1
Broker 3
P0
P1
P2
← Leader P2
Consumer Offset Tracking
M
0
M
1
M
2
M
3
M
4
M
5
M
6
committed current unread

Topic Commands

kafka-topics.sh
Create & Delete
$
kafka-topics.sh --create --bootstrap-server localhost:9092 --topic my-topic --partitions 3 --replication-factor 2
Create topic with 3 partitions, RF=2
$
kafka-topics.sh --delete --bootstrap-server localhost:9092 --topic my-topic
Delete topic permanently
Inspect
$
kafka-topics.sh --list --bootstrap-server localhost:9092
List all topics
$
kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-topic
Show partitions, leaders, ISR
$
kafka-topics.sh --describe --bootstrap-server :9092 --under-replicated-partitions
Find under-replicated partitions
Alter
$
kafka-topics.sh --alter --bootstrap-server :9092 --topic my-topic --partitions 6
Increase partitions only (can't decrease)

Produce & Consume

Console Tools
Console Producer
$
kafka-console-producer.sh --bootstrap-server localhost:9092 --topic my-topic
Send messages interactively (stdin)
$
kafka-console-producer.sh --bootstrap-server :9092 --topic my-topic --property parse.key=true --property key.separator=:
Send with keys (key:value per line)
Console Consumer
$
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my-topic --from-beginning
Read all messages from start
$
kafka-console-consumer.sh --bootstrap-server :9092 --topic my-topic --group my-group --max-messages 100
Consume with group, limit to 100 msgs
$
kafka-console-consumer.sh --bootstrap-server :9092 --topic my-topic --partition 0 --offset 42
Read from specific partition + offset
$
kafka-console-consumer.sh --bootstrap-server :9092 --topic my-topic --property print.key=true --property print.timestamp=true
Print keys + timestamps with messages

Consumer Groups

kafka-consumer-groups.sh
Monitor & Inspect
$
kafka-consumer-groups.sh --bootstrap-server :9092 --list
List all consumer groups
$
kafka-consumer-groups.sh --bootstrap-server :9092 --describe --group my-group
Show lag, offsets, member assignment
$
kafka-consumer-groups.sh --bootstrap-server :9092 --describe --all-groups
Describe all groups at once
Offset Reset
$
kafka-consumer-groups.sh --bootstrap-server :9092 --group my-group --reset-offsets --to-earliest --topic my-topic --execute
Reset to beginning of topic
$
kafka-consumer-groups.sh --bootstrap-server :9092 --group my-group --reset-offsets --to-datetime 2024-01-01T00:00:00.000 --topic my-topic --execute
Reset to specific timestamp
$
kafka-consumer-groups.sh --bootstrap-server :9092 --group my-group --reset-offsets --shift-by -100 --all-topics --execute
Shift back 100 messages across all topics
Delete
$
kafka-consumer-groups.sh --bootstrap-server :9092 --delete --group my-group
Remove inactive consumer group

Producer Config

Key Settings
  • acks=all0=fire&forget, 1=leader, all=ISR (strongest)
  • retries=2147483647Max retry attempts; use with idempotence
  • enable.idempotence=trueExactly-once delivery semantics
  • batch.size=16384Max bytes per batch (increase for throughput)
  • linger.ms=5Wait before sending batch (0=low latency)
  • compression.type=snappynone, gzip, snappy, lz4, zstd
  • max.in.flight.requests=5Set to 1 for strict ordering without idempotence
  • delivery.timeout.ms=120000Upper bound on total delivery time
  • buffer.memory=33554432Total memory for buffering unsent records
  • key.serializer=StringSerializerKey serializer class
  • value.serializer=StringSerializerValue serializer class
  • partitioner.class=DefaultPartitionerCustom partitioner for routing logic

Consumer Config

Key Settings
  • group.id=my-app-groupConsumer group identifier (required)
  • auto.offset.reset=earliestearliest | latest | none (no prior offset)
  • enable.auto.commit=trueFalse = manual offset control
  • auto.commit.interval.ms=5000Frequency of automatic offset commits
  • fetch.min.bytes=1Min data before server responds to fetch
  • fetch.max.wait.ms=500Max wait if fetch.min.bytes not satisfied
  • max.poll.records=500Max records per poll() call
  • max.poll.interval.ms=300000Max time between polls before rebalance
  • session.timeout.ms=45000Heartbeat timeout before member removed
  • heartbeat.interval.ms=3000Frequency of heartbeats to coordinator
  • isolation.level=read_committedread_uncommitted | read_committed (EOS)
  • partition.assignment.strategy=RangeAssignorRangeAssignor, RoundRobin, StickyAssignor

Broker & Topic Config

server.properties
// Broker
  • num.partitions=1Default partitions for new topics
  • default.replication.factor=1Default RF (use 3 in production)
  • min.insync.replicas=2Min ISR before producer acks=all fails
  • log.retention.hours=168Data retention period (168h = 7 days)
  • log.retention.bytes=-1Max log size per partition (-1 = unlimited)
  • log.segment.bytes=1073741824Max segment file size (1 GB)
  • num.network.threads=3Threads for network I/O
  • num.io.threads=8Threads for disk I/O
// Per-Topic Override (kafka-configs.sh)
$
kafka-configs.sh --bootstrap-server :9092 --entity-type topics --entity-name my-topic --alter --add-config retention.ms=86400000
Set 1-day retention on specific topic

Admin & Diagnostics

Operations
Cluster Info
$
kafka-broker-api-versions.sh --bootstrap-server localhost:9092
List broker API versions
$
kafka-log-dirs.sh --bootstrap-server :9092 --topic-list my-topic
Show log directories and sizes
$
kafka-get-offsets.sh --bootstrap-server :9092 --topic my-topic
Get latest offsets per partition
Performance Testing
$
kafka-producer-perf-test.sh --topic perf-test --num-records 1000000 --record-size 1000 --throughput -1 --producer-props bootstrap.servers=localhost:9092
Produce 1M records, max throughput
$
kafka-consumer-perf-test.sh --bootstrap-server :9092 --topic perf-test --messages 1000000
Consume perf test
KRaft Mode (Kafka 3.x / ZK-less)
$
kafka-storage.sh format -t $(kafka-storage.sh random-uuid) -c config/kraft/server.properties
Format storage for KRaft mode
$
kafka-metadata-quorum.sh --bootstrap-server :9092 --describe --status
Check KRaft quorum status

Delivery Semantics

Guarantees
At-Most-Once
acks=0, retries=0
Messages may be lost. Highest throughput. Use when loss is acceptable.
At-Least-Once
acks=all, retries=N
No loss but duplicates possible. Default for most workloads.
Exactly-Once (EOS)
idempotence + transactions
No loss, no duplicates. Requires enable.idempotence=true + transaction API.
Transactions
transactional.id=X
Atomic writes across multiple partitions/topics. Use read_committed on consumer.

Common Patterns

Recipes
Pattern Key Config Notes
High throughput linger.ms=20, batch.size=65536, compression=lz4 Trade latency for throughput
Low latency linger.ms=0, acks=1 Accept weaker guarantees
Dead letter queue separate topic, catch exceptions Route failed msgs to DLQ topic
Fan-out multiple consumer groups Each group reads all messages
Queue 1 partition, 1 consumer group FIFO processing
Log compaction cleanup.policy=compact Latest value per key retained
Event sourcing retention.ms=-1, compact Infinite retention + compaction