3

Hi I am having a problem sorting some data in a pandas DataFrame. The format of the data is different to what I am used to, and I have no idea how to make it work. The data I want to sort is on this format:

[['2016-05-23', 'name1', 'data1'],
['2016-05-23', 'name2', 'data2'],
['2016-05-24', 'name1', 'data1'],
['2016-05-24', 'name2', 'data2'],
['2016-05-25', 'name1', 'data1'],
['2016-05-25', 'name2', 'data2'],
['2016-05-26', 'name1', 'data1'],
['2016-05-26', 'name2', 'data2'],
['2016-05-27', 'name1', 'data1'],
['2016-05-27', 'name2', 'data2']]

What I want to do is to get it sortet on date, and with the name as colum names.

[['2016-05-23, 'data1', 'data2'],
['2016-05-24', 'data1', 'data2'],
['2016-05-25', 'data1', 'data2'] and so on...]

Is it possible to do this with some specific pandas command, I would also like the solution to work with more that just the two name columns. I have no idea how to do this and would appriciate any help.

Siesta
  • 451
  • 7
  • 21

1 Answers1

2

I think you need pivot:

import pandas as pd

df = pd.DataFrame([['2016-05-23', 'name1', 'data1'],
['2016-05-23', 'name2', 'data2'],
['2016-05-24', 'name1', 'data1'],
['2016-05-24', 'name2', 'data2'],
['2016-05-25', 'name1', 'data1'],
['2016-05-25', 'name2', 'data2'],
['2016-05-26', 'name1', 'data1'],
['2016-05-26', 'name2', 'data2'],
['2016-05-27', 'name1', 'data1'],
['2016-05-27', 'name2', 'data2']], columns = ['a','b','c'])

print (df)
            a      b      c
0  2016-05-23  name1  data1
1  2016-05-23  name2  data2
2  2016-05-24  name1  data1
3  2016-05-24  name2  data2
4  2016-05-25  name1  data1
5  2016-05-25  name2  data2
6  2016-05-26  name1  data1
7  2016-05-26  name2  data2
8  2016-05-27  name1  data1
9  2016-05-27  name2  data2
#convert column a to datetime
df['a'] = pd.to_datetime(df.a)

print (df.pivot(index='a', columns='b', values='c'))
b           name1  name2
a                       
2016-05-23  data1  data2
2016-05-24  data1  data2
2016-05-25  data1  data2
2016-05-26  data1  data2
2016-05-27  data1  data2

And then if you need sort_values e.g. by column name1:

import pandas as pd

df = pd.DataFrame([['2016-05-23', 'name1', 9],
['2016-05-23', 'name2', 4],
['2016-05-24', 'name1', 5],
['2016-05-24', 'name2', 1],
['2016-05-25', 'name1', 5],
['2016-05-25', 'name2', 7],
['2016-05-26', 'name1', 10],
['2016-05-26', 'name2', 7],
['2016-05-27', 'name1', 0],
['2016-05-27', 'name2', 1]], columns = ['a','b','c'])
print (df)
            a      b   c
0  2016-05-23  name1   9
1  2016-05-23  name2   4
2  2016-05-24  name1   5
3  2016-05-24  name2   1
4  2016-05-25  name1   5
5  2016-05-25  name2   7
6  2016-05-26  name1  10
7  2016-05-26  name2   7
8  2016-05-27  name1   0
9  2016-05-27  name2   1

print (df.pivot(index='a', columns='b', values='c').sort_values('name1'))
b           name1  name2
a                       
2016-05-27      0      1
2016-05-24      5      1
2016-05-25      5      7
2016-05-23      9      4
2016-05-26     10      7

Sometimes pivot doesnt work, then use pivot_table:

print (df.pivot_table(index='a', columns='b', values='c'))

But pivot_table uses aggfunc, default is aggfunc=np.mean if duplicates. Better explanation with sample is here and in docs.


Last you can reset_index and rename_axis (new in pandas 0.18.0):

print (df.pivot(index='a', columns='b', values='c')
         .reset_index()
         .rename_axis(None, axis=1))
            a  name1  name2
0  2016-05-23      9      4
1  2016-05-24      5      1
2  2016-05-25      5      7
3  2016-05-26     10      7
4  2016-05-27      0      1

#pandas bellow 0.18.0
df1 = df.pivot(index='a', columns='b', values='c').reset_index()
df1.columns.name = None
print (df1)
            a  name1  name2
0  2016-05-23      9      4
1  2016-05-24      5      1
2  2016-05-25      5      7
3  2016-05-26     10      7
4  2016-05-27      0      1
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, this seems to be working, is there a way to get combine this into a new dataframe? I tired to to do print (df.pivot(index='a', columns='b', values='c').sort_values('a')) in a new DataFrame to get it sorted on date, but that did not work – Siesta May 30 '16 at 10:41
  • But `a` is index and it is sorted... If not, you can use `df1 = df.pivot(index='a', columns='b', values='c').reset_index().sort_values('a')` – jezrael May 30 '16 at 10:46
  • I also add code for converting string column a to datetime, see edit. – jezrael May 30 '16 at 10:50
  • Sorry for that, it was I who wrote it in wrong. It seems to be working perfectly. – Siesta May 30 '16 at 10:51
  • Thank you for accepting. Glad can help you! Good luck! – jezrael May 30 '16 at 10:52