0

I'm having issues with pivoting the below data

     index  column  data
0    1      A       cat
1    1      B       blue
2    1      C       seven
3    2      A       dog
4    2      B       green
5    2      B       red
6    2      C       eight
7    2      C       five
8    3      A       fish
9    3      B       pink
10   3      C       one

I am attempting to pivot it by using

df.pivot(index='index', columns='column', values="data")

But I receive the error "Index contains duplicate entries, cannot reshape"

I have looked through a large number of similar posts to this but none of the solutions I tried worked

My desired output is

index  A    B     C
1      cat  blue  seven
2      dog  green eight
2      dog  green five
2      dog  red   eight
2      dog  red   five
3      fish pink  one

What would be the best solution for this?

Jordan
  • 13
  • 3
  • Can you give a clear definition of the logic of this pivoting? I understand that for a particular index X you will have the all the combinations of A, B, C (e.g. for index=2, you will have 2 (for B)x2(for C)=4 rows)? A simple pivoting won't give you this, apparently, no matter how the index is treated. – Oleg O Dec 09 '19 at 12:10

1 Answers1

0

in this question Pandas pivot warning about repeated entries on index they state that duplicate pairs (so a duplicate pair in the columns 'index' and 'column') are not possible to pivot.

in your dataset, the index 2 has two times the column values B and C.

Can you change the 'index' column? See my new dataframe as an example:

    df = pd.DataFrame({'index': [1,1,1,2,2,3,2,4,3,4,3], 
               'column': ['A','B','C','A','B','B','C','C','A','B','C'], 
               'data':['cat','blue','seven', 'dog', 'green', 'red', 
                       'eight','five', 'fish', 'pink', 'one']})
    df
    out:
    index   column  data
    0   1   A       cat
    1   1   B       blue
    2   1   C       seven
    3   2   A       dog
    4   2   B       green
    5   3   B       red
    6   2   C       eight
    7   4   C       five
    8   3   A       fish
    9   4   B       pink
    10  3   C       one

    df.pivot('index', 'column', 'data')
    out:
    column  A    B      C
    index           
    1      cat  blue    seven
    2      dog  green   eight
    3      fish red     one
    4      NaN  pink    five

Option_2

If you use unstack with 'append':

   testing = df.set_index(['index', 'column'], 
                    append=True).unstack('column')
   testing

                data
      column    A    B      C
      index         
   0    1     cat   NaN     NaN
   1    1     NaN   blue    NaN
   2    1     NaN   NaN     seven
   3    2     dog   NaN     NaN
   4    2     NaN   green   NaN
   5    2     NaN   red     NaN
   6    2     NaN   NaN     eight
   7    3     NaN   NaN     five
   8    3     fish  NaN     NaN
   9    3     NaN   pink    NaN
   10   3     NaN   NaN     one
Renate van Kempen
  • 124
  • 1
  • 2
  • 10
  • I need the index column to be like that unfortunately, is there any other workaround to get the output I want without pivot? – Jordan Dec 10 '19 at 11:31
  • Only thing I can think of than is using: testing = df.set_index(['index', 'column'], append=True).unstack('column'), but this gives you a lot of NaN's in the values. I will add this as option 2 in my answer. Hope this helps – Renate van Kempen Dec 11 '19 at 14:58