How to set up Apache Airflow on your local machine?
How to set up Apache Airflow?
To set up Airflow, you'll need to have Python and a few other dependencies installed on your machine. Here's a more detailed guide to setting up Airflow on your local machine:
Install Python and create a virtual environment
: If you don't already have Python installed, you can download it from the official website. Once you have Python installed, you can create a virtual environment using the following command:
python -m venv airflow_env
This will create a new virtual environment called "airflow_env" in the current directory. To activate the virtual environment, enter the following command:
source airflow_env/bin/activate
Install Airflow
: Once you have your virtual environment set up, you can install Airflow using pip, the Python package manager. Open a terminal and enter the following command:
pip install apache-airflow
This will install the latest version of Airflow and all of its dependencies.
Initialize the Airflow database
: After installing Airflow, you'll need to initialize the database. This will create the necessary tables and metadata in the database to store information about your DAGs and tasks. To do this, enter the following command:
airflow initdb
Start the Airflow web server: Once you have Airflow installed and the database initialized, you can start the web server by running the following command:
airflow webserver
This will start the Airflow web server and make it available at http://localhost:8080. You can access the web interface by opening this URL in your web browser.
Start the Airflow scheduler
: In addition to the web server, you'll also need to start the Airflow scheduler in order to run your workflows. To do this, open a new terminal window and enter the following command:
airflow scheduler
This will start the Airflow scheduler and enable it to run your workflows according to their schedules.
How to run your first workflow in Apache Airflow
Now that you have Airflow set up on your local machine, you're ready to create and run your first workflow. Here are the steps you'll need to follow:
Create a new DAG
: The first step in creating a workflow is to define a DAG. A DAG is a collection of tasks that are organized in a specific order, and can be triggered to run on a schedule or in response to certain events. To create a new DAG, you'll need to create a Python file in the "dags" directory within your Airflow home directory.
Define your tasks
: Once you have your DAG defined, you'll need to add some tasks to it. Airflow includes a wide range of operators that you can use to perform different types of tasks. For example, you can use the PythonOperator to execute arbitrary Python code, the BashOperator to run shell commands, and the SqlOperator to execute SQL statements.
Set the schedule
: Once you have your tasks defined, you'll need to specify when they should be run. You can do this using the "schedule_interval" parameter in your DAG definition. For example, to run your workflow every hour, you can set the schedule interval like this:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2015, 12, 1),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'schedule_interval': '@hourly',
}
dag = DAG('tutorial', catchup=False, default_args=default_args)
For more details visit Airflow github Documentation