Data Retention Strategies in Kafka
Data retention is a crucial aspect of managing a Kafka cluster. It determines how long messages are stored before they are discarded. Kafka provides two main strategies for data retention: time-based retention and size-based retention.
Time-Based Retention
Time-based retention is configured using the log.retention.hours
or log.retention.minutes
parameters. These settings specify the maximum time for which a log segment is retained before it is eligible for deletion.
Key points:
- Default value is 168 hours (7 days).
- Applies to all partitions of a topic.
- Deletion occurs during the next log segment roll.
Size-Based Retention
Size-based retention is configured using the log.retention.bytes
parameter. This setting specifies the maximum size of a partition before old log segments are deleted to free up space.
Key points:
- Default value is -1 (infinite).
- Applies at the partition level.
- Useful for managing disk space.
Log Segment Deletion
Log segment deletion is triggered based on the active retention strategy. When a log segment meets the retention criteria (age or size), it becomes eligible for deletion.
Important aspects:
- Deletion occurs in the background by the log cleaner.
- Deletion is performed per partition.
- Segment deletion doesn’t impact consumer offsets.
Implications of Data Retention
The choice of data retention strategy and parameters has several implications:
-
Disk Space Usage:
- Longer retention periods or larger size limits lead to increased disk space consumption.
- Proper monitoring and capacity planning are essential.
-
Data Availability:
- Shorter retention periods may result in data being unavailable for consumers that fall behind.
- Consumers need to process data within the retention window.
-
Replay Scenarios:
- Longer retention allows for replaying events from a specific point in time.
- Useful for error recovery, auditing, or data reprocessing.
-
Compliance and Legal Requirements:
- Certain industries may have specific data retention requirements.
- Retention settings should align with legal and compliance needs.
Configuring Retention Settings
Retention settings can be configured at different levels:
-
Broker Level:
- Configured in the
server.properties
file. - Applies to all topics unless overridden at the topic level.
- Configured in the
-
Topic Level:
- Configured using topic-level configuration.
- Overrides the broker-level settings for the specific topic.
# Example: Set retention period to 30 days for a topic
kafka-configs.sh --bootstrap-server localhost:9092 --alter --topic my-topic --add-config retention.ms=2592000000