Kubernetes CronJobs
Introduction to CronJobs
In Kubernetes, CronJobs
are native objects that manage time-based jobs. They are essentially the Kubernetes equivalent of a cron task in traditional systems. A CronJob
will run jobs on a time-based schedule, and those jobs will create individual Pods
for their tasks which are then managed by the CronJob until they complete.
Defining a CronJob
Here’s a basic structure for a CronJob in YAML:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-cronjob
spec:
schedule: "*/1 * * * *" # Every minute
jobTemplate:
spec:
template:
spec:
containers:
- name: my-container
image: my-image
restartPolicy: OnFailure
schedule
: This is a string that defines when the job should run. It follows the standard cron format.jobTemplate
: The job that should be run at the specified schedule. It is essentially a pod specification with added fields relevant to jobs.
Other important properties
You can include startingDeadlineSeconds
, completions
, parallelism
, and other relevant fields under the spec section of your CronJob definition. Here’s an example of how you can modify your existing YAML configuration:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-cronjob
spec:
schedule: "*/1 * * * *" # Every minute
startingDeadlineSeconds: 100 # Max time to start a job if missed
concurrencyPolicy: Forbid # Options: Allow, Forbid, Replace
successfulJobsHistoryLimit: 3 # Keep 3 successful job records
failedJobsHistoryLimit: 1 # Keep 1 failed job record
ttlSecondsAfterFinished: 3600 # Clean up finished jobs after 1 hour
jobTemplate:
spec:
backoffLimit: 3 # Number of retries before marking a job as failed
completions: 5 # Total number of completions needed
parallelism: 2 # Number of pods processed in parallel
activeDeadlineSeconds: 200 # Max time for the job to run
template:
spec:
containers:
- name: my-container
image: my-image
restartPolicy: OnFailure
In this modified configuration:
startingDeadlineSeconds
specifies the deadline in seconds for starting the job if it misses its scheduled time for any reason.concurrencyPolicy
determines how to treat concurrent executions of a job.successfulJobsHistoryLimit
and failedJobsHistoryLimit specify how many completed and failed jobs should be kept.- Inside the
jobTemplate
,completions
is the number of times the job needs to be successfully completed. parallelism
is the number of jobs that will run in parallel.activeDeadlineSeconds
sets the maximum duration that the job can run.backoffLimit
is set to 3, meaning the job will be retried up to 3 times before it is considered failed.ttlSecondsAfterFinished
is set to 3600 seconds (1 hour), which means completed jobs (either successful or failed) will be automatically deleted one hour after finishing.
The restartPolicy in a Kubernetes job or a pod template within a CronJob is quite relevant and plays a crucial role in how the Kubernetes system handles container failures or terminations.
- For Jobs and CronJobs,
restartPolicy
is particularly important. A common setting in Jobs is OnFailure, as you might want the job to retry if it fails. Always
is not a validrestartPolicy
for Jobs and CronJobs since Jobs are intended to run to completion (either success or failure), and an always-restarting policy would contradict this behavior.Never
can be used if you want to ensure that no automatic restarts are attempted, which can be useful for debugging or in scenarios where a failure must be handled in a custom manner.- The combination of
restartPolicy
andbackoffLimit
in a job definition can define the job’s retry behavior. For example, ifrestartPolicy
is set toOnFailure
andbackoffLimit
is greater than 0, the job will be retried up to the specified limit if it fails.
YAML differences between Jobs and CronJobs
Understanding these differences is crucial, especially for scenarios covered in the CKAD exam:
- Schedule: Only present in CronJob for defining the time-based schedule.
- Concurrency Policy and History Limits: Specific to CronJob for managing job execution and retention.
- JobTemplate: In CronJob, the job configuration is nested under
jobTemplate
, whereas in a Job, the configuration is directly under spec.
Monitoring and Troubleshooting CronJobs
-
List CronJobs: Use the command
kubectl get cronjobs
to list existing CronJobs. -
View CronJob Details: To see more details, use
kubectl describe cronjob <cronjob-name>
. -
View Logs: Since CronJobs create Jobs, which in turn create Pods, to view the logs of a CronJob, you first need to identify the specific Job/Pod. This can be done with
kubectl get jobs
and thenkubectl logs <pod-name>
to fetch the logs. -
Common Issues:
- Misconfigured schedule
- Image or command errors inside the container
- Insufficient resources or quotas
- Job running longer than expected
Best Practices
-
Idempotence: Ensure that the tasks being run are idempotent. If a CronJob fails and is retried, it shouldn’t create unintended side-effects.
-
Concurrency: By default, CronJobs are allowed to run concurrently. Use the
concurrencyPolicy
field to adjust this if needed. -
Failure Handling: Utilize the
restartPolicy
to define what should happen if the job fails. Most of the time,OnFailure
is a good choice. -
Cleanup Old Jobs: By default, all successful and failed job pods are kept, which can clutter the cluster. Adjust the
.spec.successfulJobsHistoryLimit
and.spec.failedJobsHistoryLimit
fields to clean up old job pods.
Exercises for CronJobs, Troubleshooting, and Imperative Commands
Exercise 1: Creating a CronJob Imperatively
Objective: Create a CronJob that runs every 5 minutes using the busybox
image and the command echo "Hello from CronJob!"
.
- Use the imperative command to create a CronJob.
- Validate that the CronJob has been created.
- Observe the Pods created by the CronJob over a 10-minute period.
Solution:
kubectl create cronjob hello-cron --image=busybox --schedule="*/5 * * * *" -- echo "Hello from CronJob!"
kubectl get cronjobs
# Wait and check for pods periodically
kubectl get pods
Exercise 2: Troubleshooting a Failing CronJob
Objective: Troubleshoot a CronJob that is not producing any Pods.
- Here’s a CronJob YAML for a task that’s supposed to run every minute:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: faulty-cron
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: my-container
image: busybox
command:
- "/bin/nonexistent"
restartPolicy: OnFailure
- Apply this YAML to your cluster.
- Identify the issue that’s preventing the CronJob from creating any Pods.
- Fix the issue and verify.
Solution:
# Apply the CronJob
kubectl apply -f faulty-cron.yaml
# Check the jobs
kubectl get jobs
# Describe the job to see what's wrong
kubectl describe job <job-name>
# From the describe command, you'll notice a command error.
# The command doesn't exist, which causes the container to fail immediately.
# To fix, modify the YAML to have a valid command, e.g., ["echo", "Fixed!"]
Exercise 3: Using Imperative Commands for Troubleshooting
Objective: Delete all the Pods associated with a specific CronJob.
- Use the previous CronJob definition (
hello-cron
). - Generate some Pods by waiting for 10 minutes.
- Use imperative commands to fetch all the Pods associated with
hello-cron
. - Delete all these Pods using a single command.
Solution:
# Fetch all pods associated with the CronJob
pods=$(kubectl get pods --selector=job-name=hello-cron-<unique_id> -o=jsonpath='{.items[*].metadata.name}')
# Delete these pods
kubectl delete pods $pods