Pandas: how to groupby on concatenated dataframes with same column names?

Question

How to properly concat (or maybe this is .merge()?) N dataframes with the same column names, so that I could groupby them with distinguished column keys. For ex:

dfs = {
  'A': df1, // columns are C1, C2, C3
  'B': df2, // same columns C1, C2, C3      
}
gathered_df = pd.concat(dfs.values()).groupby(['C2'])['C3']\
    .count()\
    .sort_values(ascending=False)\
    .reset_index()

I want to get something like

|----------|------------|-------------|
|          |    A       |     B       |
| C2_val1  | count_perA | count_perB  |
| C2_val2  | count_perA | count_perB  |
| C2_val3  | count_perA | count_perB  |

score 1 · Accepted Answer · edited Sep 27 '17 at 16:50

I think you need reset_index for create columns from MultiIndex and then add column to groupby dor distinguish dataframes. Last reshape by unstack:

gathered_df = pd.concat(dfs).reset_index().groupby(['C2','level_0'])['C3'].count().unstack()

What is the difference between size and count in pandas?

Sample:

df1 = pd.DataFrame({'C1':[1,2,3],
                   'C2':[4,5,5],
                   'C3':[7,8,np.nan]})


df2 = df1.mul(10).fillna(1)
df2.C2 = df1.C2

print (df1)
   C1  C2   C3
0   1   4  7.0
1   2   5  8.0
2   3   5  NaN

print (df2)
   C1  C2    C3
0  10   4  70.0
1  20   5  80.0
2  30   5   1.0

dfs = {
  'A': df1, 
  'B': df2     
}

gathered_df = pd.concat(dfs).reset_index().groupby(['C2','level_0'])['C3'].count().unstack()
gathered_df.index.name = None
gathered_df.columns.name = None
print (gathered_df)
   A  B
4  1  1
5  1  2

Pandas: how to groupby on concatenated dataframes with same column names?

1 Answers1