I'm trying to write a code that should give me a summary of the data provided. However, there are few problems I'm facing right now.
When I use the groupby function, I get the following output.
I want the output to give me a sorted result. For example, the Airport with the highest "SkyTeam" Customer appear at the top. I've written some parts of the code but couldn't find a way to connect them.
#(Airport name, Skyteam passenger, airport total passenger)
#Sorted with 2 rule, "Skyteam Highest to Lowest" and "Total pax Highest to lowest"
('JFK', '12903454','37796770')
('IAD', '1050960', '14616186')
df_3 = pd.merge(df_initial, df_info[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_5 = df_3.groupby(['Origin Airport','Operating Airline Alliance'])["Operating Airline Capacity"].sum()
df_6 = df_3.groupby(['Origin Airport'])["Operating Airline Capacity"].sum()
for Airports in df_6.items() :
print(Airports)
The output:
('IAD', 14616186)
('JFK', 37796770)
Origin Airport Operating Airline Alliance
IAD LCC 292410
Oneworld 955450
Qatar 129210
SkyTeam 1050960
Star Alliance 7458776
US-Domestic 598389
JFK LCC 8735749
Oneworld 6530773
Qatar 240528
SkyTeam 12903454
Star Alliance 2722954
US-Domestic 217540
Name: Operating Airline Capacity, dtype: int64
Thanks to everyone who've replied to my question. I've went on with M.Erkin's approach and edited the code he provided a bit.
df_3 = pd.merge(df_initial, df_info[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_3 = df_3.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"],as_index=False).agg({"Operating Airline Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline Capacity', ascending=False)
df_4 = pd.merge(df_initial, df_infoTK[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_4 = df_4.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"]).agg({"Operating Airline Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline Capacity', ascending=False)
df_5 = pd.merge(df_initial, df_infoQT[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_5 = df_5.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"]).agg({"Operating Airline Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline Capacity', ascending=False)
In the final part i've consolidated the results with a merge function. I did it in 2 steps because i don't really know how to do it in 1 step but if you do, please feel free to advise!
df_6 = pd.merge(df_3, df_4[["Operating Airline Capacity", "DepCount"]], on='Origin Airport', how='left')
#df_6.fillna(0,inplace=True)
df_7 = pd.merge(df_6 ,df_5[["Operating Airline Capacity", "DepCount"]], on='Origin Airport', how='left')
Leaving this here. It might help someone seeking similar projects!