I have a dataframe that looks like this:
country region region_id year doy variable_a num_pixels
0 USA Iowa 12345 2022 1 32.2 100
1 USA Iowa 12345 2022 2 12.2 100
2 USA Iowa 12345 2022 3 22.2 100
3 USA Iowa 12345 2022 4 112.2 100
4 USA Iowa 12345 2022 5 52.2 100
The year in the dataframe above is 2022. I have more dataframes for other years starting from 2010 onwards.
I have also dataframes for other variables: variable_b
, variable_c
.
I want to combine all these dataframes into a single dataframe such that
- The years are listed vertically, one below the other
- the data for the different variables is listed horizontally. The output should look like this:
country region region_id year doy variable_a variable_b variable_c
0 USA Iowa 12345 2010 1 32.2 44 101
1 USA Iowa 12345 2010 2 12.2 76 2332
..........................................................................
n-1 USA Iowa 12345 2022 1 321.2 444 501
n USA Iowa 12345 2022 2 122.2 756 32
What is the most efficient way to achieve this? Please note that there will be overlap in years in the other dataframes so the solution needs to take that into account and not leave NaN values.