0

I have some dataframes (df) with categorical data starting with: a, b, c and a category for "remaining categories".

I would like to sort the month column in the dataframe ascending=true, but then have the category column sorted so that they are in the following order:

c

a

b

"remaining category"

Is this possible? --> Basically I want a custom sort order for a specific column, but then have the month column sorted in order of date.

yoshiserry
  • 20,175
  • 35
  • 77
  • 104
  • sure, this is the point of the new Categorical data type, see here: ttp://pandas.pydata.org/pandas-docs/stable/categorical.html – Jeff Dec 02 '14 at 23:17
  • Thanks Jeff -- Can you make a column in a dataframe into a categorical datatype after you have imported the data? I.e. at the moment my categorical data is of datatype object, not categorical? I'd like to run some of the operations on it from the page you suggested. – yoshiserry Dec 02 '14 at 23:23

3 Answers3

1

docs are here

In [8]: df = DataFrame({'A' : [1,1,1,2,2,3], 'B' : list('bbcdae') })

In [9]: df.dtypes
Out[9]: 
A     int64
B    object
dtype: object

In [10]: df['B'] = pd.Categorical(df['B'],categories=list('ghbaedfc'))

In [11]: df
Out[11]: 
   A  B
0  1  b
1  1  b
2  1  c
3  2  d
4  2  a
5  3  e

In [12]: df.dtypes
Out[12]: 
A       int64
B    category
dtype: object

In [13]: df.sort(['B','A'])
Out[13]: 
   A  B
0  1  b
1  1  b
4  2  a
5  3  e
3  2  d
2  1  c
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • 1
    Thanks Jeff that worked perfectly - where can I find out more about these methods that can be applied to pd? like pd.todatetime, pd.categorical etc? – yoshiserry Dec 02 '14 at 23:46
0

You can do it with a dict and adding a new 'sort' column to your dataframe. Check out this similar question Custom sorting with Pandas

Community
  • 1
  • 1
Bob Haffner
  • 8,235
  • 1
  • 36
  • 43
0

Pandas sort() was deprecated and removed from Pandas with release 0.20 (2017-05-05).

Pandas sort_values() and sort_index() are now used instead of sort(). To get a df in categorical order use pd.Categorical()

df = pd.DataFrame({'Panel':['Left', 'Right', 'Top', 'Bottom','Left'],
         Value':[70, 50, 30, 40, 60]})
df['Panel'] = pd.Categorical(df['Panel'],
        categories=['Top','Bottom','Left','Right'],ordered=True)

results

    Panel   Value
2   Top     30
3   Bottom  40
0   Left    70
4   Left    60
1   Right   50
Shane S
  • 1,747
  • 14
  • 31