0

I'm having a DataFrame like this:

| date     | dimension A| dimension B| dimension C| dimension D| counts     |
+----------+------------+------------+------------+------------+------------+
| 1-2-2001 | a1         | b1         | c1         | d1         | 52         |
| 1-1-2001 | a2         | b2         | c2         | d2         | 33         |
| 1-2-2001 | a3         | b3         | c3         | d3         | 41         |
| 1-1-2001 | a4         | b4         | c4         | d4         | 19         |

What I want to do is let python do df.groupby automatically with each combination of two dimensions, and create a new dataframe with every result. i.e. the following:

df1 = df.groupby(['date', 'dimension A']).sum()
df2 = df.groupby(['date', 'dimension B']).sum()
...
df5 = df.groupby(['dimension A', 'dimension B']).sum()
...
df10 = df.groupby(['dimension C', 'dimension D']).sum()

What should I do?

NovaPoi
  • 73
  • 1
  • 7

1 Answers1

4

You can use the function combinations to generate different column combinations. Then you can add GroupBy objects or DataFrames to a list (dictionary):

from itertools import combinations

dfs = []

for i, j in combinations(df.columns, 2):
    dfs.append(df.groupby([i, j])) # or df.groupby([i, j]).mean()

You can also use a list (dict) comprehenstion:

[df.groupby([i, j]) for i, j in combinations(df.columns, 2)]
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
  • Thanks for your prompt answer. I tried your method like this: dfs = [df.groupby([i, j]).sum() for i, j in combinations(df.columns, 2)] dfs and then got the error message: can only concatenate str (not "int") to str What am I doing wrong here? – NovaPoi Nov 25 '20 at 07:01
  • 1
    See this [answer](https://stackoverflow.com/questions/25039626/how-do-i-find-numeric-columns-in-pandas) how to select numeric colums. Do that before the `for` loop. Or you need to clean your data. – Mykola Zotko Nov 25 '20 at 07:08
  • You are right - I cleaned my data and everything went well. Thanks again. – NovaPoi Nov 25 '20 at 07:26