I have a dataframe in pandas that looks like this:
In[12]: a
Out[12]:
alpha beta gamma G1 G2 G3 \
0 7 1 1 5.617164 9.048353 2.229996
1 7 2 2 1469.200000 1475.224000 1468.049000
2 7 3 62 5.595052 8.975431 2.203285
3 7 4 62 355.046700 -29.822800 6.901535
4 7 5 62 706.328700 -67.995310 9.047969
5 7 6 62 882.444500 -87.075830 10.673920
6 7 7 62 706.259600 -67.769160 9.366320
7 7 8 62 355.401700 -29.832250 6.651977
8 7 9 62 5.669636 9.041898 2.168643
9 7 10 12 5.663535 9.039821 2.166068
10 7 11 22 1469.277000 1475.238000 1468.002000
G4 G5 G6
0 -67.46422 -22.76504 -142.4664
1 1397.88500 1450.74000 1328.4590
2 -67.48421 -22.68970 -142.5102
3 -68.16720 -22.94553 -134.2837
4 -68.53533 -22.72636 -128.2633
5 -68.89483 -22.90121 -129.0006
6 -68.86677 -22.81106 -127.4094
7 -68.27580 -22.86239 -133.6061
8 -67.44511 -22.82177 -142.4721
9 -67.44169 -22.79920 -142.4393
10 1397.89300 1450.75400 1328.5100
I am using pandas.groupby function to group the dataframe by the columns 'alpha' and 'beta'. When I run the function, the output of the groups looks as intended, with indexing order preserved.
b = a.groupby(['alpha','beta'])
b.groups
Out[18]:
{(7, 1): Index(['0'], dtype='object'),
(7, 2): Index(['1'], dtype='object'),
(7, 3): Index(['2'], dtype='object'),
(7, 4): Index(['3'], dtype='object'),
(7, 5): Index(['4'], dtype='object'),
(7, 6): Index(['5'], dtype='object'),
(7, 7): Index(['6'], dtype='object'),
(7, 8): Index(['7'], dtype='object'),
(7, 9): Index(['8'], dtype='object'),
(7, 10): Index(['9'], dtype='object'),
(7, 11): Index(['10'], dtype='object')}
However, the function I am using in the future requires the values inside the tuples of this groupby function to be strings, not integers. Thus I simply change the indicies in 'a' to be strings instead of integers through the following:
a[['alpha','beta']] = a[['alpha','beta']].astype(str)
I then perform the same function as previous to group the dataframe by 'alpha' and 'beta'. HOWEVER, when I do this, the order of indexing is no longer preserved:
b = a.groupby(['alpha','beta'])
b.groups
Out[26]:
{('7', '1'): Index(['0'], dtype='object'),
('7', '10'): Index(['9'], dtype='object'),
('7', '11'): Index(['10'], dtype='object'),
('7', '2'): Index(['1'], dtype='object'),
('7', '3'): Index(['2'], dtype='object'),
('7', '4'): Index(['3'], dtype='object'),
('7', '5'): Index(['4'], dtype='object'),
('7', '6'): Index(['5'], dtype='object'),
('7', '7'): Index(['6'], dtype='object'),
('7', '8'): Index(['7'], dtype='object'),
('7', '9'): Index(['8'], dtype='object')}
As you can see, some of the index locations have "flip-flopped". I have tried forcing the groupby input sort=False
, but it returns the same thing. On one note, I would like to know how to stop this from happening, but I am also curious as to WHY this is happening (trying to understand pandas more). Is there a hierarchical order that pandas is looking for when the dtype is an 'object' instead of an 'integer'?
Thanks in advance.