How can I groupby a dataFrame whose columns names are tuples?

Question

I have a dataFrame that I created inside 4 loops. I'm not sure if this is the best way to do it, but after long research I only managed to create a dataFrame with tuples of lenght 4 as column names. I now need to groupby all columns with conditions in some of the entries in the tuple, not in order. Here's an example of what I have:

import numpy as np
import pandas as pd
from collections import namedtuple

tuplekey = namedtuple("tuplekey", ["key1","key2","key3","key4"])

randomarray = np.random.rand(10)

list1 = []
for i in range(0,2):
    list2 = []
    for j in range(0,2):
        list3 = []
        for k in range(0,2):
            list4 = []
            for l in range(0,2):

                key = tuplekey('I'+str(i), 'J'+str(j), 'K'+str(k), 'L'+str(l))
                df1 = pd.DataFrame({key:[randomarray]})
                list4.append(df1)

            df2 = pd.concat(list4, axis=1)
            list3.append(df2)

        df3 = pd.concat(list3, axis=1)
        list2.append(df3)

    df4 = pd.concat(list2, axis=1)
    list1.append(df4)

df = pd.concat(list1, axis=1)

list(df.columns.values)

>>> [('I0', 'J0', 'K0', 'L0'),
 ('I0', 'J0', 'K0', 'L1'),
 ('I0', 'J0', 'K1', 'L0'),
 ('I0', 'J0', 'K1', 'L1'),
 ('I0', 'J1', 'K0', 'L0'),
 ('I0', 'J1', 'K0', 'L1'),
 ('I0', 'J1', 'K1', 'L0'),
 ('I0', 'J1', 'K1', 'L1'),
 ('I1', 'J0', 'K0', 'L0'),
 ('I1', 'J0', 'K0', 'L1'),
 ('I1', 'J0', 'K1', 'L0'),
 ('I1', 'J0', 'K1', 'L1'),
 ('I1', 'J1', 'K0', 'L0'),
 ('I1', 'J1', 'K0', 'L1'),
 ('I1', 'J1', 'K1', 'L0'),
 ('I1', 'J1', 'K1', 'L1')]

I would now need to groupby by "I1", and then groupby by "K1" and "K2".

I tried using

group = df.groupby(["I1"])

but this gives the following error:

ValueError: Grouper for 'I1' not 1-dimensional

I understand that this is wrong since my column names are tuples of length 4, but I don't know how to say

df.groupby(["I1",*,*,*])

where each * is a "wildcard".

I looked for that error and found this answer giving a solution for it. Since I have 4 keys instead of 2, I tried:

df1.rename(columns={ key[3] : {key[2] : { key[0]:key[1] }}}, inplace=True)

But this gives the error

TypeError: unhashable type: 'dict'

So how could I groupby by "I1" (and further by "I1" and "K1", or so on) in this case?

Finally I want to add that I dont need the names of the dataFrame to be a tuple, I just need to keep the information of each loop. I'm trying to use Pandas because later on I would like to plot some of the dataFrame using seaborn. If you think there's a better way to build this dataFrame so that later on I could operate over it in an easier way, please don't hesitate about telling me so·

did you try `df.groupby([["I1"]])` instead of `df.groupby(["I1"])`. Also are you aware you created a `MultiIndex ` dataframe, just asking as it's not written anywhere? — Ben.T, Apr 26 '18 at 18:59
@Ben.T `df.groupby([["I1"]])` gives `ValueError: Grouper and axis must be same length`. And no, I was definitely not aware I created a MultiIndex dataFrame. I will now search for a solution among MultiIndex dataFrames! Thanks for your input! — lanadaquenada, Apr 26 '18 at 19:36
Ok, it works for me, not sure why (I copied you code). So if you just want 16 regular columns named according your list, there is a simpler way, I'll write an answer — Ben.T, Apr 26 '18 at 20:04

score 1 · Accepted Answer · answered Apr 26 '18 at 20:19

To create easily a DF with 16 columns named with tuples, you can do:

import pandas as pd
import itertools
list_ind = [['I0', 'I1'], ['J0', 'J1'], ['K0', 'K1'], ['L0', 'L1']]
list_col = list(itertools.product(*list_ind)) # all permutations possible
df1 = pd.DataFrame(columns = list_col )

Note that the DF is empty.

After if you want to groupby tuple containing I1 for example, you can do:

list_I1 = [tup for tup in df1.columns if tup[0] == 'I1']
group = df1.groupby(list_I1)

Is it what you look for?

How can I groupby a dataFrame whose columns names are tuples?

1 Answers1