Feeding the Data-lake I: Task management / automatization using Airflow
What is Airflow?
Airflow is a platform to programmatically author, schedule and monitor workflows. The idea is to program your workflow automation using python.
We use the DAG concept to build tasks on Airflow. DAG stands for Direct Acyclic Graphs. This tasks are then interpreted and scheduled for execution by the Airflow execution engine.
What is a DAG?
In graph theory,
Airflow install in docker
Let's build a docker cluster to run Airflow locally, by following the oficial documentation. Please refer to that documentation if you bump into any issues or to get a more thorough view of this solution.
Step 1: Install docker-compose
sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose$ sudo chmod +x /usr/local/bin/docker-composeStep 2: Get the docker compose yaml example file
This yaml file will have a default config to run an Airflow for testing/developping purposes. You can use it as a base to build your own docker-compose config for production environments.
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.2.4/docker-compose.yaml'Step 3: Creating the local folders and .env file
The local folder names are specified in the docker-compose file and will be used by the airflow to access your dags, write logs or load/store plugins.
A .env file will be read as environment variables by the Airflow's engine. So here you can put some important settings as well, such as defining the user id for the engine. It's important to define this ID to avoid it to be executed as root on the container.
mkdir -p ./dags ./logs ./plugins echo -e "AIRFLOW_UID=$(id -u)" > .env
Step 4: Containers initial setup
The following command will create the docker containers, initialize the postgre db and setup everything.
docker-compose up airflow-init
Step 5: Run
For the final step, run the following commands and check that Airflow is running by accessing it's web interface and running a dummy example DAG task.
docker-compose up -d
Official documentation: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html
Once you load the web interface, you'll see something like this
Comentários
Postar um comentário