Pivoting tables with duplicated data in pandas

Question

This question is closely related to the one I asked here, but it is not same.

My question is, suppose I have stacked data and I want pivot this table by Date and Name. My question is, when there are ambiguous values (more than one value for the same Date and Name) how can I choose between them ?

Name Date         Value 
a    2016-05-01   5
b    2016-05-01   7
c    2016-05-01   7
c    2016-05-01   8
a    2016-06-01   4
b    2016-06-01   3
c    2016-06-01   2
d    2016-06-01   2

Ideally I would like to "choose" the final value based on another column. For example suppose my dataset is:

Name Date         Value   PublishDate
a    2016-05-01   5       2016-04-01    
b    2016-05-01   7       2016-04-01
c    2016-05-01   7       2016-03-01
c    2016-05-01   8       2016-04-01
a    2016-06-01   4       2016-04-01
b    2016-06-01   3       2016-04-01
c    2016-06-01   2       2016-04-01
d    2016-06-01   2       2016-04-01

Then, between the [c,2016-05-01] duplicates, I'd like the one that was published last to stay (PublishDate=2016-04-01) and the other one to be discarded.

Is is possible ?

Ultimately I need this output, so that I can perform matrix operations on it.

Data/Name    a   b   c   d
2016-05-01   5   7   8   NaN
2016-06-01   4   3   2   2

I took a look at this question, but it is not quite the same, since I want one value to be discarded based on kind of a "rule".

But that question gives the insight that tere might be some use of the pandas.pivot_table function to help acomplish it. But I couldn't find it.

Thanks !

score 1 · Answer 1 · answered Apr 11 '16 at 13:04

1

I would sort by PublishDate, groupby what you want using the last aggregate function. It will return the last row (latest publish date for you) for each 'Name'.

df = df.sort_index(by='PublishDate').copy()
df.reset_index(inplace=True)
grouped_df = df.groupby('Name').last()

answered Apr 11 '16 at 13:04

dmb

1,669
1
12
21

Thanks for the help. That's before I pivot the dataset correct ? – Pedro Braz Apr 11 '16 at 13:07
additionally, I can't group by stock. it would have to be something like group by ``Stock`` and ``Date``. Is it possible ? – Pedro Braz Apr 11 '16 at 13:47
Another issue, after grouping I'm not being able to pivot the dataframe, because the indexes are no longer ``Date`` and ``Stock``, but the values themselves.I've updated the question to highlight that ultimately I need a matrix to perform matrix operations on it. If you could please address those issues on your answer I would be very grateful – Pedro Braz Apr 11 '16 at 13:56

Pedro Braz · Accepted Answer · 2016-04-11T18:21:48.020

0

Based on @dmb 's answer I came up with the following way of coping with the issues that I've stated on the comments:

df = df.sort_values(by='PublishDate').copy()
df = df.pivot_table(index = 'Date',columns = 'Stock',values = 'Value', aggfunc = 'last')

edited Apr 11 '16 at 18:21

answered Apr 11 '16 at 14:34

Pedro Braz

2,261
3
25
48

Pivoting tables with duplicated data in pandas

2 Answers2