1

I have two tables for mean and std, I would like to calculate the maximum over some columns for the mean table, And I would like to get corresponding rows for the std table.

mean_df = pd.read_csv(r'./csvs/mean.csv')

std_df = pd.read_csv(r'./csvs/std.csv')

#mean_df, std_df are of same size

grouped_df = mean_df.groupby(['alpha', 'beta'])

columns = ['val']
max_df = grouped_df[columns].agg(['max'])

# Here i want the corresponding std_max_df table for the max_df. i.e., for every max calculated from mean, i want the std of that max in a new table.

For example:

input mean_df is

alpha beta gamma val

1 2 3 100

4 6 8 200

1 2 9 400

4 6 7 500

3 5 8 600


input std_df is

alpha beta gamma val

1 2 3 300

4 6 8 500

1 2 9 100

4 6 7 700

3 5 8 900


output will be

alpha beta gamma max_mean_val corresp_std_val

1 2 9 400 100

4 6 7 500 700

3 5 8 600 900


charanReddy
  • 137
  • 1
  • 11
  • What is your question exactly? Please read [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask). Particularly, the part with "Introduce the problem before you post any code". – darcamo Sep 23 '20 at 18:08
  • Can you look into it now? I have two tables basically, I calculate max over some columns in table1, i want corresponding rows in table 2 – charanReddy Sep 23 '20 at 18:53
  • Check [this question](https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-count-in-groups-using-groupby). – darcamo Sep 24 '20 at 07:01
  • Does this answer your question? [Get the Row(s) which have the max count in groups using groupby](https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-count-in-groups-using-groupby) – darcamo Sep 24 '20 at 07:02

1 Answers1

1

First merge DataFrames together by inner or left join and then get rows with maximal index values by DataFrameGroupBy.idxmax:

df = mean_df.merge(std_df, on=['alpha', 'beta','gamma'], 
                           how='left', 
                           suffixes=('_mean','_std'))

df = df.loc[df.groupby(['alpha', 'beta'])['val_mean'].idxmax()]
print (df)
   alpha  beta  gamma  val_mean  val_std
2      1     2      9       400      100
4      3     5      8       600      900
3      4     6      7       500      700

For multiple columns one possible idea:

print (mean_df)

   alpha  beta  gamma  val  val1
0      1     2      3  100     5
1      4     6      8  200     9
2      1     2      9  400     3
3      4     6      7  500     5
4      3     5      8  600     2

print (std_df)
   alpha  beta  gamma  val  val1
0      1     2      3  300     5
1      4     6      8  500     7
2      1     2      9  100     9
3      4     6      7  700     2
4      3     5      8  900     1


df = mean_df.merge(std_df, on=['alpha', 'beta','gamma'], 
                           how='left', 
                           suffixes=('_mean','_std'))

cols = ['val','val1']

dfs = [df.loc[df.groupby(['alpha', 'beta'])[f'{col}_mean'].idxmax(), 
              [f'{col}_mean', f'{col}_std']].reset_index(drop=True) for col in cols]

df0 = df[['alpha', 'beta','gamma']].drop_duplicates(['alpha','beta']).reset_index(drop=True)
df = pd.concat([df0] + dfs, axis=1)
print (df)
   alpha  beta  gamma  val_mean  val_std  val1_mean  val1_std
0      1     2      3       400      100          5         5
1      4     6      8       600      900          2         1
2      3     5      8       500      700          9         7
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Its a good idea, I was also trying to use merge, How do you tackle the problem of multiple val columns, i.e., val1_mean, val2_mean, val1_std, val2_std – charanReddy Sep 24 '20 at 20:23
  • for multiple val columns i solved it by taking df.loc multiple times and assigning 'valx_std' columns to a max of mean_df. Is there any better answer for this – charanReddy Sep 24 '20 at 22:21
  • 1
    @charanReddy - Working for solution, give em a sec. – jezrael Sep 25 '20 at 06:24