Building on this question, starting with this pandas dataframe,
import pandas as pd
data = {'id':[1, 419, 425, 432],
'city_0':['Prague', 'Prague', 'Copenhagen', 'Santiago'],
'city_1':['Copenhagen', 'Barcelona', 'Barcelona', 'Berlin'],
'Fare 0->1':[1000, 1200, 1500, 2050],
'Fare 1->0':[1100, 1150, 1600, 2000]
}
df = pd.DataFrame(data)
Input df
:
id city_0 city_1 Fare 0->1 Fare 1->0
0 1 Prague Copenhagen 1000 1100
1 419 Prague Barcelona 1200 1150
2 425 Copenhagen Barcelona 1500 1600
3 432 Santiago Berlin 2050 2000
I'm trying to generate this kind of adjacency matrix where df.X.Y
is the fare for going from X
to Y
.
Expected output:
Prague Copenhagen Santiago Barcelona Berlin
Prague NaN 1100 NaN 1150 NaN
Copenhagen 1000 NaN NaN 1600 NaN
Santiago NaN NaN NaN NaN 2000
Barcelona 1200 1500 NaN NaN NaN
Berlin NaN NaN 2050 NaN NaN
What I've tried:
df_city_0 = df[['city_0', ]].copy()
df_city_1 = df[['city_1', ]].copy()
df_city_0.columns = ['city'] # rename both the columns to a single name
df_city_1.columns = ['city']
df_cities = df_city_0.append(df_city_1) # make them one column
df_cities = df_cities['city'].unique()
# array(['Prague', 'Copenhagen', 'Santiago', 'Barcelona', 'Berlin'], dtype=object)
df_fares_adjacency = pd.DataFrame(columns=df_cities, index=df_cities)
# Prague Copenhagen Santiago Barcelona Berlin
# Prague NaN NaN NaN NaN NaN
# Copenhagen NaN NaN NaN NaN NaN
# Santiago NaN NaN NaN NaN NaN
# Barcelona NaN NaN NaN NaN NaN
# Berlin NaN NaN NaN NaN NaN
for index, row in df.iterrows():
df_fares_adjacency[row['city_0']][row['city_1']] = row['Fare 0->1']
df_fares_adjacency[row['city_1']][row['city_0']] = row['Fare 1->0']
# Prague Copenhagen Santiago Barcelona Berlin
# Prague NaN 1100 NaN 1150 NaN
# Copenhagen 1000 NaN NaN 1600 NaN
# Santiago NaN NaN NaN NaN 2000
# Barcelona 1200 1500 NaN NaN NaN
# Berlin NaN NaN 2050 NaN NaN
This way I'm able to get the desired matrix, but looping over a dataframe feels wrong.
Is there a more efficient and 'pandasic' way than using df.iterrows()
over what could potentially be a very large dataframe?