1

I have a dataframe df which indeces are

df.index
Out[4]: 
Index([u'2015-03-28_p001_2', u'2015-03-29_p001_2',
       u'2015-03-30_p001_2', u'2015-03-31_p001_2',
       u'2015-03-31_p002_3', u'2015-04-01_p001_2',
       u'2015-04-01_p002_3', u'2015-04-02_p001_2',
       u'2015-04-02_p002_3', u'2015-04-03_p001_2',
       ...
       u'2016-03-31_p127_1', u'2016-04-01_p127_1',
       u'2016-04-01_p128_3', u'2016-04-02_p127_1',
       u'2016-04-02_p128_3', u'2016-04-03_p127_1',
       u'2016-04-03_p128_3', u'2016-04-04_p127_1',
       u'2016-04-05_p127_1', u'2016-04-06_p127_1'],
      dtype='object', length=781)

The dataframe df is the results of a merge of 2 dataframes.

As you can see from the indeces are not sorted. E.g. '2015-03-31_p002_3'(5th position) comes before '2015-04-01_p001_2' (6th position)

I would like to group together all the _p001_2 and sort them according to the date, then all the _p002_3, etc etc.

But I didnt manage to do it...

gabboshow
  • 5,359
  • 12
  • 48
  • 98
  • 2
    Possible duplicate of [Python, pandas: how to sort dataframe by index](https://stackoverflow.com/questions/22211737/python-pandas-how-to-sort-dataframe-by-index) – Zero Aug 09 '17 at 15:01

1 Answers1

0

If sort_index is not possible use, then it is a bit complicated - need create helper DataFrame by split, then sort_values and last reindex:

idx = pd.Index([u'2015-03-28_p001_2', u'2015-03-29_p001_2',
       u'2015-03-30_p001_2', u'2015-03-31_p001_2',
       u'2015-03-31_p002_3', u'2015-04-01_p001_2',
       u'2015-04-01_p002_3', u'2015-04-02_p001_2',
       u'2015-04-02_p002_3', u'2015-04-03_p001_2',

       u'2016-03-31_p127_1', u'2016-04-01_p127_1',
       u'2016-04-01_p128_3', u'2016-04-02_p127_1',
       u'2016-04-02_p128_3', u'2016-04-03_p127_1',
       u'2016-04-03_p128_3', u'2016-04-04_p127_1',
       u'2016-04-05_p127_1', u'2016-04-06_p127_1'])

df = pd.DataFrame({'a':range(len(idx))}, index=idx)
print (df)
                    a
2015-03-28_p001_2   0
2015-03-29_p001_2   1
2015-03-30_p001_2   2
2015-03-31_p001_2   3
2015-03-31_p002_3   4
2015-04-01_p001_2   5
2015-04-01_p002_3   6
2015-04-02_p001_2   7
2015-04-02_p002_3   8
2015-04-03_p001_2   9
2016-03-31_p127_1  10
2016-04-01_p127_1  11
2016-04-01_p128_3  12
2016-04-02_p127_1  13
2016-04-02_p128_3  14
2016-04-03_p127_1  15
2016-04-03_p128_3  16
2016-04-04_p127_1  17
2016-04-05_p127_1  18
2016-04-06_p127_1  19

df = df.sort_index()
print (df)
                    a
2015-03-28_p001_2   0
2015-03-29_p001_2   1
2015-03-30_p001_2   2
2015-03-31_p001_2   3
2015-03-31_p002_3   4
2015-04-01_p001_2   5
2015-04-01_p002_3   6
2015-04-02_p001_2   7
2015-04-02_p002_3   8
2015-04-03_p001_2   9
2016-03-31_p127_1  10
2016-04-01_p127_1  11
2016-04-01_p128_3  12
2016-04-02_p127_1  13
2016-04-02_p128_3  14
2016-04-03_p127_1  15
2016-04-03_p128_3  16
2016-04-04_p127_1  17
2016-04-05_p127_1  18
2016-04-06_p127_1  19

df1 = df.index.to_series().str.split('_', expand=True)
df1[0] = pd.to_datetime(df1[0])
#if necessary change order columns for sorting 
df1 = df1.sort_values(by=[1,2,0])
print (df1)
                           0     1  2
2015-03-28_p001_2 2015-03-28  p001  2
2015-03-29_p001_2 2015-03-29  p001  2
2015-03-30_p001_2 2015-03-30  p001  2
2015-03-31_p001_2 2015-03-31  p001  2
2015-04-01_p001_2 2015-04-01  p001  2
2015-04-02_p001_2 2015-04-02  p001  2
2015-04-03_p001_2 2015-04-03  p001  2
2015-03-31_p002_3 2015-03-31  p002  3
2015-04-01_p002_3 2015-04-01  p002  3
2015-04-02_p002_3 2015-04-02  p002  3
2016-03-31_p127_1 2016-03-31  p127  1
2016-04-01_p127_1 2016-04-01  p127  1
2016-04-02_p127_1 2016-04-02  p127  1
2016-04-03_p127_1 2016-04-03  p127  1
2016-04-04_p127_1 2016-04-04  p127  1
2016-04-05_p127_1 2016-04-05  p127  1
2016-04-06_p127_1 2016-04-06  p127  1
2016-04-01_p128_3 2016-04-01  p128  3
2016-04-02_p128_3 2016-04-02  p128  3
2016-04-03_p128_3 2016-04-03  p128  3

df = df.reindex(df1.index)
print (df)
                    a
2015-03-28_p001_2   0
2015-03-29_p001_2   1
2015-03-30_p001_2   2
2015-03-31_p001_2   3
2015-04-01_p001_2   5
2015-04-02_p001_2   7
2015-04-03_p001_2   9
2015-03-31_p002_3   4
2015-04-01_p002_3   6
2015-04-02_p002_3   8
2016-03-31_p127_1  10
2016-04-01_p127_1  11
2016-04-02_p127_1  13
2016-04-03_p127_1  15
2016-04-04_p127_1  17
2016-04-05_p127_1  18
2016-04-06_p127_1  19
2016-04-01_p128_3  12
2016-04-02_p128_3  14
2016-04-03_p128_3  16

EDIT:

If duplicates, then is necessary create new columns, sort and last drop them:

df[[0,1,2]] = df.index.to_series().str.split('_', expand=True)
df[0] = pd.to_datetime(df[0])
df = df.sort_values(by=[1,2,0])
df = df.drop([0,1,2], axis=1)
print (df)
                    a
2015-03-28_p001_2   0
2015-03-29_p001_2   1
2015-03-30_p001_2   2
2015-03-31_p001_2   3
2015-04-01_p001_2   5
2015-04-02_p001_2   7
2015-04-03_p001_2   9
2015-03-31_p002_3   4
2015-04-01_p002_3   6
2015-04-02_p002_3   8
2016-03-31_p127_1  10
2016-04-01_p127_1  11
2016-04-02_p127_1  13
2016-04-03_p127_1  15
2016-04-04_p127_1  17
2016-04-05_p127_1  18
2016-04-06_p127_1  19
2016-04-01_p128_3  12
2016-04-02_p128_3  14
2016-04-03_p128_3  16
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252