I love the idea of airflow but I'm stuck in the basics. Since yesterday I have airflow running on a vm ubuntu-postgres solution. I can see the dashboard and the example data :)) What I want now is to migrate an example script which I use to process raw to prepared data.
Imagine u have a folder of csv files. Today my script iterates through it, passing each file to a list which is going to be converted into a df. After that I prepare their columns names and do some data cleaning and write it into a different format.
1: pd.read_csv for files in directory
2: create a df
3: clean column names
4: clean values (parallel to stp 3)
5: write the result to a database
How would I have to organize my files according to airflow? How should the script look like? Am I passing a single method, a single file or do I have to create several files for each part? I'm lacking the basic concept at this point :( Everything I read about airflow is way more complex than my simple case. I was considering to step away from airflow as well to Bonobo, Mara, Luigi, but I think airflow is worth it?!