I have two huge dataframes that both have the same id field. I want to make a simple summary dataframe where I show the maximum of specific columns. I understand iterrows()
is frowned upon, so are a couple one-liners to do this? I don't understand lambda/apply very well, but maybe this would work here.
Stand-alone example
import pandas as pd
myid = [1,1,2,3,4,4,5]
name =['A','A','B','C','D','D','E']
x = [15,12,3,3,1,4,8]
df1 = pd.DataFrame(list(zip(myid, name, x)),
columns=['myid', 'name', 'x'])
display(df1)
myid = [1,2,2,2,3,4,5,5]
name =['A','B','B','B','C','D','E','E']
y = [9,6,3,4,6,2,8,2]
df2 = pd.DataFrame(list(zip(myid, name, y)),
columns=['myid', 'name', 'y'])
display(df2)
mylist = df['myid'].unique()
df_summary = pd.DataFrame(mylist, columns=['MY_ID'])
## do work here...
Desired output