I have a dataFrame that I created inside 4 loops. I'm not sure if this is the best way to do it, but after long research I only managed to create a dataFrame with tuples of lenght 4 as column names. I now need to groupby all columns with conditions in some of the entries in the tuple, not in order. Here's an example of what I have:
import numpy as np
import pandas as pd
from collections import namedtuple
tuplekey = namedtuple("tuplekey", ["key1","key2","key3","key4"])
randomarray = np.random.rand(10)
list1 = []
for i in range(0,2):
list2 = []
for j in range(0,2):
list3 = []
for k in range(0,2):
list4 = []
for l in range(0,2):
key = tuplekey('I'+str(i), 'J'+str(j), 'K'+str(k), 'L'+str(l))
df1 = pd.DataFrame({key:[randomarray]})
list4.append(df1)
df2 = pd.concat(list4, axis=1)
list3.append(df2)
df3 = pd.concat(list3, axis=1)
list2.append(df3)
df4 = pd.concat(list2, axis=1)
list1.append(df4)
df = pd.concat(list1, axis=1)
list(df.columns.values)
>>> [('I0', 'J0', 'K0', 'L0'),
('I0', 'J0', 'K0', 'L1'),
('I0', 'J0', 'K1', 'L0'),
('I0', 'J0', 'K1', 'L1'),
('I0', 'J1', 'K0', 'L0'),
('I0', 'J1', 'K0', 'L1'),
('I0', 'J1', 'K1', 'L0'),
('I0', 'J1', 'K1', 'L1'),
('I1', 'J0', 'K0', 'L0'),
('I1', 'J0', 'K0', 'L1'),
('I1', 'J0', 'K1', 'L0'),
('I1', 'J0', 'K1', 'L1'),
('I1', 'J1', 'K0', 'L0'),
('I1', 'J1', 'K0', 'L1'),
('I1', 'J1', 'K1', 'L0'),
('I1', 'J1', 'K1', 'L1')]
I would now need to groupby by "I1", and then groupby by "K1" and "K2".
I tried using
group = df.groupby(["I1"])
but this gives the following error:
ValueError: Grouper for 'I1' not 1-dimensional
I understand that this is wrong since my column names are tuples of length 4, but I don't know how to say
df.groupby(["I1",*,*,*])
where each * is a "wildcard".
I looked for that error and found this answer giving a solution for it. Since I have 4 keys instead of 2, I tried:
df1.rename(columns={ key[3] : {key[2] : { key[0]:key[1] }}}, inplace=True)
But this gives the error
TypeError: unhashable type: 'dict'
So how could I groupby by "I1" (and further by "I1" and "K1", or so on) in this case?
Finally I want to add that I dont need the names of the dataFrame to be a tuple, I just need to keep the information of each loop. I'm trying to use Pandas because later on I would like to plot some of the dataFrame using seaborn. If you think there's a better way to build this dataFrame so that later on I could operate over it in an easier way, please don't hesitate about telling me so·