I have a dataframe A that has three columns: 'id', 'value' and 'date'. I was doing a groupby by id, and noticed something strange: if I do two successive sorts for rows with a given id, first by value and then date, the order of operations affects the order of rows. Notice the order of rows with index 42915 and 42916:
A.sort_values('value').sort_values('date')[A.sort_values('value').sort_values('date')['id'] == '0001249666']
id value date
42913 0001249666 113845.0 20130408
42914 0001249666 114597.0 20130430
42916 0001249666 125972.0 20140414
42915 0001249666 125971.0 20140414
42917 0001249666 136154.0 20150410
42918 0001249666 145551.0 20160407
42919 0001249666 152911.0 20170413
A[A['id'] == '0001249666'].sort_values('value').sort_values('date')
id value date
42913 0001249666 113845.0 20130408
42914 0001249666 114597.0 20130430
42915 0001249666 125971.0 20140414
42916 0001249666 125972.0 20140414
42917 0001249666 136154.0 20150410
42918 0001249666 145551.0 20160407
42919 0001249666 152911.0 20170413
If I however use .sort_values with two arguments, the order doesn't matter.
A.sort_values(['date','value'])[A.sort_values(['date','value'])['id'] == '0001249666']
id value date
42913 0001249666 113845.0 20130408
42914 0001249666 114597.0 20130430
42915 0001249666 125971.0 20140414
42916 0001249666 125972.0 20140414
42917 0001249666 136154.0 20150410
42918 0001249666 145551.0 20160407
42919 0001249666 152911.0 20170413
a[a['id'] == '0001249666'].sort_values(['date','value'])
id value date
42913 0001249666 113845.0 20130408
42914 0001249666 114597.0 20130430
42915 0001249666 125971.0 20140414
42916 0001249666 125972.0 20140414
42917 0001249666 136154.0 20150410
42918 0001249666 145551.0 20160407
42919 0001249666 152911.0 20170413
Now I know that what I'm doing isn't the smartest way of doing what I want, but I am really interested in understanding what could explain this behaviour? What am I implicitly assuming when I don't understand this behaviour.