THE MODN CHRONICLES

Interview Prep

Interview Questions on Kafka — Topics, Partitions, Consumer Groups, and What Backend Interviews Actually Test

Apache Kafka is the backbone of event-driven architecture at companies like Flipkart, Swiggy, Razorpay, and every major Indian fintech. If you are interviewing for a backend or data engineering role, Kafka questions are coming. Here is what they ask.

Distributed systems and message streaming

Kafka processes trillions of messages daily at companies like LinkedIn, Uber, and Netflix. In India, every fintech and e-commerce company uses it.

Why Kafka Is on Every Backend Interview

Apache Kafka is a distributed event streaming platform used for real-time data pipelines and event-driven microservices. In India, it is used by every major product company — Flipkart for order processing, Swiggy for delivery tracking, Razorpay for payment events, and PhonePe for transaction streaming. Service companies working on these clients also need Kafka expertise.

Kafka interviews test your understanding of distributed messaging, partitioning strategy, consumer group mechanics, and how to handle failures. This guide covers the questions that actually get asked — from architecture basics to production scenarios.

Kafka Architecture

Q1: What is Kafka? Explain its core components.

Kafka = distributed event streaming platform

Core components:

Producer → publishes messages to topics
Consumer → reads messages from topics
Broker   → Kafka server that stores messages
Topic    → category/feed name for messages
Partition → subdivision of a topic (parallelism)
Offset   → unique ID of a message within a partition
ZooKeeper/KRaft → cluster coordination (metadata)

Architecture:
┌──────────┐     ┌─────────────────────┐     ┌──────────┐
│ Producer │ ──→ │   Kafka Cluster     │ ──→ │ Consumer │
│ Producer │ ──→ │  Broker 1 │ Broker 2│ ──→ │ Consumer │
│ Producer │ ──→ │  Broker 3 │ Broker 4│ ──→ │ Consumer │
└──────────┘     └─────────────────────┘     └──────────┘

Key characteristics:
- Distributed (runs on multiple servers)
- Fault-tolerant (data replicated across brokers)
- High throughput (millions of messages/second)
- Persistent (messages stored on disk)
- Pull-based (consumers pull, not push)

Q2: What are topics and partitions? Why partition?

Topic = logical channel for messages
  Example: "orders", "payments", "user-events"

Partition = physical subdivision of a topic
  Topic "orders" with 3 partitions:
  
  Partition 0: [msg0, msg3, msg6, msg9...]
  Partition 1: [msg1, msg4, msg7, msg10...]
  Partition 2: [msg2, msg5, msg8, msg11...]

Why partition?
1. Parallelism — multiple consumers read simultaneously
2. Scalability — partitions spread across brokers
3. Ordering — messages within a partition are ordered
   (but NOT across partitions)

Partition key determines which partition a message goes to:
- Same key → same partition (ordering guaranteed)
- No key → round-robin (load balanced)

Example: partition by user_id
- All events for user_123 go to same partition
- Guarantees order for that user's events

Q3: What are consumer groups? How does rebalancing work?

Consumer Group = set of consumers that share the work

Topic with 4 partitions, Consumer Group with 2 consumers:
  Consumer A reads: Partition 0, Partition 1
  Consumer B reads: Partition 2, Partition 3

Rules:
- Each partition is consumed by EXACTLY ONE consumer in a group
- One consumer can read multiple partitions
- If consumers > partitions, extra consumers are idle

Rebalancing happens when:
- New consumer joins the group
- Consumer leaves (crash or shutdown)
- New partitions added to topic

During rebalancing:
- All consumers stop reading briefly
- Partitions are reassigned
- Consumers resume from last committed offset

// This is why you should not have more consumers
// than partitions — extras just sit idle

Reliability and Delivery Guarantees

Q4: What are the delivery semantics in Kafka?

At-most-once:
- Message may be lost, never duplicated
- Consumer commits offset BEFORE processing
- If processing fails, message is skipped
- Use case: metrics, logs (loss is acceptable)

At-least-once (default):
- Message is never lost, may be duplicated
- Consumer commits offset AFTER processing
- If commit fails, message is reprocessed
- Use case: most applications (handle duplicates)

Exactly-once:
- Message is processed exactly once
- Requires idempotent producer + transactional API
- Kafka supports this since version 0.11
- Use case: financial transactions, billing

// Producer acks setting:
// acks=0  → fire and forget (fastest, least safe)
// acks=1  → leader acknowledges (balanced)
// acks=all → all replicas acknowledge (safest, slowest)

Q5: What is the replication factor? How does Kafka handle broker failure?

Replication factor = number of copies of each partition

Topic "orders" with replication factor 3:
  Partition 0: Broker 1 (leader), Broker 2, Broker 3
  Partition 1: Broker 2 (leader), Broker 3, Broker 1

Leader: handles all reads and writes for a partition
Followers: replicate data from leader

ISR (In-Sync Replicas):
- Followers that are caught up with the leader
- If leader fails, a new leader is elected from ISR

Broker failure scenario:
1. Broker 1 goes down
2. Partitions where Broker 1 was leader need new leaders
3. Kafka elects new leaders from ISR
4. Producers/consumers automatically redirect
5. When Broker 1 comes back, it catches up as follower

// Best practice: replication factor = 3
// Can tolerate 1 broker failure without data loss
Distributed systems monitoring dashboard

Kafka's partition and replication model is what makes it fault-tolerant. Understanding this is critical for senior interviews.

Practical and Scenario Questions

Q6: How do you decide the number of partitions for a topic?

Factors: 1) Expected throughput — more partitions = more parallelism. 2) Number of consumers — partitions should be ≥ consumers in the largest consumer group. 3) Message ordering requirements — if you need global ordering, use 1 partition (but lose parallelism). 4) Broker capacity — each partition uses memory and file handles. Rule of thumb: start with number of brokers × number of consumers, then tune based on load testing.

Q7: Kafka vs RabbitMQ — when to use which?

Kafka                        RabbitMQ
─────────────────────        ─────────────────────
Log-based (append-only)      Queue-based (message deleted)
Pull model                   Push model
High throughput (millions/s) Lower throughput (thousands/s)
Messages persist on disk     Messages deleted after consume
Replay possible              No replay
Better for streaming         Better for task queues
Consumer groups              Competing consumers

Use Kafka when:
- Event streaming / event sourcing
- High throughput needed
- Need to replay messages
- Multiple consumers for same data

Use RabbitMQ when:
- Task queue / work distribution
- Complex routing needed
- Request-reply pattern
- Lower latency per message

Q8: How do you handle message ordering in Kafka?

Key insight: Kafka guarantees ordering WITHIN a partition, not across partitions. To ensure ordering for related messages, use the same partition key. Example: for an e-commerce order, use order_id as the partition key — all events for that order (created, paid, shipped, delivered) go to the same partition and are consumed in order.

If you need global ordering across all messages, use a single partition — but this eliminates parallelism and becomes a bottleneck at scale. The trade-off between ordering and throughput is a common interview discussion point.

How to Prepare

Kafka Interview — Priority by Role

Backend Developer

  • • Producer/consumer basics
  • • Topics and partitions
  • • Consumer groups
  • • Delivery semantics
  • • Kafka vs RabbitMQ

Data Engineer

  • • Kafka Connect
  • • Kafka Streams
  • • Schema Registry (Avro)
  • • Exactly-once semantics
  • • Data pipeline design

DevOps / SRE

  • • Cluster management
  • • Replication & ISR
  • • Monitoring (lag, throughput)
  • • Broker failure handling
  • • Performance tuning

Practice Kafka Interview Questions with AI

Get asked real Kafka interview questions — partitioning, consumer groups, delivery guarantees, and architecture design. Practice explaining distributed systems concepts.

Free · AI-powered feedback · System design questions