TL;DR Have lots of data to be treated in a consistent way. Looking for automation solutions
Hi everyone, I'm dealing with datasets from 2013 to 2022, regarding forest fires in Portugal. The main objective is to provide the general public with official information regarding number of fires, burnt area, and other indicators that allow for a discussion of this issue based on facts, not on speculation.
The thing is that I'm having problems in creating a workflow that will not have me writing the same code for all the dataframes.
I would like to know if there is method where I can turn this:
total_records_2022 = df_in_2022['id'].nunique()
total_records_2021 = df_in_2021['id'].nunique()
total_records_2020 = df_in_2020['id'].nunique()
total_records_2019 = df_in_2019['id'].nunique()
total_records_2018 = df_in_2018['id'].nunique()
total_records_2017 = df_in_2017['id'].nunique()
total_records_2016 = df_in_2016['id'].nunique()
total_records_2015 = df_in_2015['id'].nunique()
total_records_2014 = df_in_2014['id'].nunique()
total_records_2013 = df_in_2013['id'].nunique()
or this
# GET TOTAL BURNT AREA FOR EACH YEAR
total_burn_area_measure = " ha"
df_in_2022_reset_burntarea = df_in_2022['icnf.burnArea.total'].fillna(0)
total_burnt_area_2022_number_full = df_in_2022['icnf.burnArea.total'].sum()
total_burnt_area_2022_number = "{:.2f}".format(total_burnt_area_2022_number_full)
total_burnt_area_2022 = total_burnt_area_2022_number + total_burn_area_measure
into a loop that runs all the dataframes and applies whatever data treatment I need. The full code can be found here and, as you can see, there is a lot of data to be treated and but in a consistent way.
Any help, or guidance, you could give is much appreciated.
Full disclosure: This code will be used for non-commercial purposes.