I have six CSV files for six different years and I'd like to combine them into a single dataframe, with the column headers appropriately labelled.
Each raw CSV file looks like this (e.g. 2010.csv)
state,gender,population
FL,m,2161612
FL,f,2661614
TX,m,3153523
TX,f,3453523
...
And this is the structure I'd like to end up with:
state gender population_2010 population_2012 population_2014 .....
FL m 2161612 xxxxxxx xxxxxxx .....
FL f 2661614 xxxxxxx xxxxxxx .....
TX m 3153526 xxxxxxx xxxxxxx .....
TX f 3453523 xxxxxxx xxxxxxx .....
How can I do this efficiently? Currently I have this:
df_2010 = pd.read_csv("2010.csv")
df_2012 = pd.read_csv("2012.csv")
...
temp = df_2010.merge(df_2012, on=("state", "gender"), how="outer", suffixes=("_2010", "_2012")
temp1 = temp.merge(df_2014, on=("state", "gender"), how="outer", suffixes=(None, "_2014")
... repeat five more times to get the final dataframe
But I feel there must be a better way.