What is Apache Airflow?
Apache Airflow is an open-source tool that helps you automate, schedule, and monitor workflows, a set of tasks that need to run in a specific order.
Think of it like this:
- You define tasks (e.g., sending an email, cleaning data).
- You schedule them (e.g., run every day at 6 PM).
- Airflow makes sure they run in order, retries if they fail, and shows logs and status.
What is the Airflow Scheduler?
- Reads your workflows (called DAGs)
- Checks if it’s time to run any task
- Sends the task to workers for execution
You don’t write code to create the Scheduler, but you write DAGs that the Scheduler reads.
Step-by-Step: How to Use Apache Airflow
1. Install Airflow
Use the official method with pip. Run this in your terminal:
Set up Airflow environment:
Create a user:
Start the services:
In a new terminal:
Now go to http://localhost:8080. This is your Airflow UI.
2. Create Your First DAG (Workflow)
Go to your DAGs folder (~/airflow/dags) and create a file:
daily_email_dag.py
3. Understand the Code
Section |
What It Does |
send_email() |
A function that will run as your task |
PythonOperator |
Runs your function |
schedule_interval |
Tell Airflow to run this every day at 6 PM |
dag_id |
Unique ID for your workflow |
start_date |
When to start running |
4. See It in Action
- Go to http://localhost:8080.
- Find the DAG named daily_email_sender.
- Turn it ON (toggle switch).
- You can click "Trigger DAG" to run it manually or wait for the schedule.
- View logs to see the print output.
Common Schedule Examples
Schedule |
schedule_interval Value |
Every day at midnight |
'@daily' |
Every hour |
'@hourly' |
Every 10 minutes |
'*/10 * * * *' |
Every Monday |
'0 0 * * 1' |
No schedule, manual only |
None |
Conclusion
- Airflow makes automation simple.
- The Scheduler runs your tasks on time.
- You define everything in Python using DAGs.
- Airflow shows logs, retries on failure, and monitors workflows.
Happy coding !!