0

I'm pretty new to python and pandas and I'm trying to transform some data. I have a dataset with three columns as shown below:

A       B               C
col1    21-03-2019      1.2
col2    21-03-2019      23
col3    21-03-2019      45
col4    21-03-2019      2.4
col5    21-03-2019      78
col1    14-07-2019      0.1
col2    14-07-2019      AM
col3    14-07-2019      CDM
col4    14-07-2019      66
col5    14-07-2019      0.1

I'm tyring to pivot the dataframe using B as my index and the pivot table works fine.

import pandas as pd 

# creating a dataframe 
df = pd.DataFrame({'A': ['col1', 'col2', 'col3', 'col4', 'col5', 'col1', 'col2', 'col3' ,'col4', 'col5'], 
      'B': [21-03-2019,21-03-2019,21-03-2019,21-03-2019,21-03-2019, 14-07-2019,14-07-2019,14-07-2019,14-07-2019,14-07-2019], 
      'C': [1.2, 23, 45, 2.4, 78, 0.1, 'AM', 'CDM', 66, 0.1]}) 

df.pivot(index='B', columns='A', values='C')



A            col1 col2 col3 col4 col5
B                           
21-03-2019  1.2   23   45   2.4   78
14-07-2019  0.1   AM   CDM  66    0.1

But there are situations in my dataframe where B column is same for all records, as shown below:

A       B               C
col1    21-03-2019      1.2
col2    21-03-2019      23
col3    21-03-2019      45
col4    21-03-2019      2.4
col5    21-03-2019      78
col1    21-03-2019      0.1
col2    21-03-2019      AM
col3    21-03-2019      CDM
col4    21-03-2019      66
col5    21-03-2019      0.1

In this case, the pivot fails with the below error:

ValueError: Index contains duplicate entries, cannot reshape

I tried resetting the index (although I don't know what resetting index means) df.pivot(index='B', columns='B', values='C').reset_index('B') but it's still the same error.

My expected output is:

A           col1  col2 col3 col4  col5
B                           
21-03-2019  1.2   23   45   2.4   78
21-03-2019  0.1   AM   CDM  66    0.1

How can I solve this? Can someone please help me?

LearneR
  • 2,351
  • 3
  • 26
  • 50
  • How do you want to aggregate `C` in those cases? – yatu Apr 03 '20 at 11:22
  • Hi @LearneR, welcome to Python/Pandas! For learning more about things like resetting the index, you might find [this tutorial](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html) useful. – ASGM Apr 03 '20 at 11:22
  • @yatu: I don't really want any aggregation for C. I just want the values to be displayed as-is. – LearneR Apr 03 '20 at 11:27
  • One of the problems Pandas is having is that it doesn't know what logic to use to combine the rows, since `B` is the same for all rows. What distinguishes the first `col1` from the second? In the first dataframe it's the different values of `B`, but in the second dataframe Pandas isn't sure what to do. Do you always want to pivot in groups of five? Can you explain the underlying logic? – ASGM Apr 03 '20 at 11:28

1 Answers1

2

If col1 exist for each group and it is first is possible create helper column g by compare values by Series.eq and cumulative sum Series.cumsum:

df['g'] = df['A'].eq('col1').cumsum()

And then for no aggregation solution use DataFrame.set_index with Series.unstack:

df1 = df.set_index(['B', 'g', 'A'])['C'].unstack()

Or for aggregation by first value is possible use DataFrame.pivot_table:

df1 = df.pivot_table(index=['B', 'g'], columns='A', values='C', aggfunc='first')
print (df1)
A            col1 col2 col3 col4 col5
B          g                         
21-03-2019 1  1.2   23   45  2.4   78
           2  0.1   AM  CDM   66  0.1

Last is possible remove helper level of MultiIndex:

df1 = df1.reset_index(level=1, drop=True)
print (df1)
A          col1 col2 col3 col4 col5
B                                  
21-03-2019  1.2   23   45  2.4   78
21-03-2019  0.1   AM  CDM   66  0.1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thank you very much for this solution. I used the one with `set_index()` and `unstack()` and it works perfectly for all my datasets. I really need to dig deeper into Pandas. I would've never come to this solution myself without your response. Thanks, again. – LearneR Apr 03 '20 at 12:22