Let say I have data about cities with the name of the mayor and the baker (if there is one) :
city name_mayor age_mayor name_baker age_baker
0 Cherbourg Robert 10 Jack 40
1 Calais Michel 20 Russel 50
2 Nevers Guy 30 None None
I then want to create a new dataframe to work on the individuals, I thus would like a dataframe like so :
city name age
0 Cherbourg Robert 10
1 Calais Michel 20
2 Nevers Guy 30
3 Cherbourg Jack 40
4 Calais Russel 50
Then it is easier to compute things such as mean age.
Can anyone tell me :
- How can I do so ?
- Should I work like that with Pandas ?
Basically I can do it with an iteration over rows, but I read that it is often better to use other ways with pandas (as stated here : How to iterate over rows in a DataFrame in Pandas).
I'm not new to pandas, but still stuck in an "numpy-array" way of thinking.
If needed here are how I made my two example :
data_1 = { "city" : ["Cherbourg", "Calais", "Nevers"], "name_mayor" : ["Robert", "Michel", "Guy"], "age_mayor" : [10,20,30], "name_baker" : ["Jack", "Russel"], "age_baker" : [40,50]}
df_1 = pd.DataFrame.from_dict(data_1, orient='index').transpose()
data_2 = {0:["Cherbourg", "Robert", 10], 1:["Calais", "Michel", 20], 2:["Nevers", "Guy", 30], 3:["Cherbourg", "Jack", 40], 4:["Calais", "Russel", 50] }
df_2 = pd.DataFrame.from_dict(data_2, orient='index', columns=["city", "name", "age"])
Thanks ! R