I have a dataframe with the following structure:
df = pd.DataFrame({'TIME':list('12121212'),'NAME':list('aabbccdd'), 'CLASS':list("AAAABBBB"),
'GRADE':[4,5,4,5,4,5,4,5]}, columns = ['TIME', 'NAME', 'CLASS','GRADE'])
print(df):
TIME NAME CLASS GRADE
0 1 a A 4
1 2 a A 5
2 1 b A 4
3 2 b A 5
4 1 c B 4
5 2 c B 5
6 1 d B 4
7 2 d B 5
What I need to do is split the above dataframe into multiple dataframes according to the variable CLASS
, convert the dataframe from long to wide (such that we have NAMES
as columns and GRADE
as the main entry in the data matrix) and then iterate other functions over the smaller CLASS
dataframes. If I create a dict
object as suggested here, I obtain:
d = dict(tuple(df.groupby('CLASS')))
print(d):
{'A': TIME NAME CLASS GRADE
0 1 a A 4
1 2 a A 5
2 1 b A 4
3 2 b A 5, 'B': TIME NAME CLASS GRADE
4 1 c B 4
5 2 c B 5
6 1 d B 4
7 2 d B 5}
In order to convert the dataframe from long to wide, I used the function pivot_table
from pandas
:
for names, classes in d.items():
newdata=df.pivot_table(index="TIME", columns="NAME", values="GRADE")
print(newdata):
NAME a b c d
TIME
1 4 4 4 4
2 5 5 5 5
So far so good. However, once I obtain the newdata
dataframe I am not able to access the smaller dataframes created in d
, since the variable CLASS
is now missing from the dataframe (as it should be). Suppose I then need to iterate a function over the two smaller subframes CLASS==A
and CLASS==B
. How would I be able to do this using a for loop if I am not able to define the dataset structure using the column CLASS
?