119

I am trying to unstack a multi-index with pandas and I am keep getting:

ValueError: Index contains duplicate entries, cannot reshape

Given a dataset with four columns:

  • id (string)
  • date (string)
  • location (string)
  • value (float)

I first set a three-level multi-index:

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)

In [38]: e
Out[38]: 
                                    value
id           date       location       
id1          2014-12-12 loc1        16.86
             2014-12-11 loc1        17.18
             2014-12-10 loc1        17.03
             2014-12-09 loc1        17.28

Then I try to unstack the location:

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
    143 
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147 

ValueError: Index contains duplicate entries, cannot reshape

What is going on here?

ARF
  • 7,420
  • 8
  • 45
  • 72
  • 2
    Are there any NaNs in your data? There was [an unstack bug related to those](https://github.com/pydata/pandas/issues/4862). – BrenBarn Feb 21 '15 at 20:45
  • What version of pandas are you using? Could you give a small example DataFrame which demonstrates this issue (so we can just run your code and see if we get the same result)? – Andy Hayden Feb 21 '15 at 20:57
  • @BrenBarn There are no NAN in my dataset. – ARF Feb 21 '15 at 21:27
  • @AndyHayden I am using pandas 0.15.2. I will try to create a dataframe that reproduces the result, but so far, I have been unable to do so. – ARF Feb 21 '15 at 21:28
  • This is a very old question, however my two cents.. I was getting this error and while checking the data I found there are duplicates data in my excel cell. I was trying to break multiple rows in a cell to individual rows. Once I remove duplicates, this error is not anymore – BigData-Guru Jul 29 '19 at 16:24
  • Just to add to last comment, `df.drop_duplicates(['id','date','location'])` would remove the duplicates so that the operation can proceed. Solution from https://stackoverflow.com/a/13833239/1769163 – cengique May 08 '23 at 21:55

3 Answers3

76

Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?

In [11]: df
Out[11]:
   0  1  2      3
0  1  2  a  16.86
1  1  2  a  17.18
2  1  4  a  17.03
3  2  5  b  17.28

In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')  # desired?
Out[12]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

In [13]: df1 = df.set_index([0, 1, 2])

In [14]: df1
Out[14]:
           3
0 1 2
1 2 a  16.86
    a  17.18
  4 a  17.03
2 5 b  17.28

In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape

One solution is to reset_index (and get back to df) and use pivot_table.

In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 21
    I had a large dataframe that I was fairly sure had the correct indices. Turned out that some of the sub-dataframes were being created incorrectly, and indices were duplicated. `np.where(df.index.duplicated())` is a real help here. – naught101 Apr 21 '17 at 04:04
70

There's a far more simpler solution to tackle this.

The reason why you get ValueError: Index contains duplicate entries, cannot reshape is because, once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique.

You can avoid this by retaining the default index column (row #) and while setting the index using "id", "date" and "location", add it in "append" mode instead of the default overwrite mode.

So use,

e.set_index(['id', 'date', 'location'], append=True)

Once this is done, your index columns will still have the default index along with the set indexes. And unstack will work.

Let me know how it works out.

HVS
  • 2,427
  • 3
  • 21
  • 19
  • 6
    2 hours trying to figure this out, couldn't believe it could be as hard as some of the posts I've seen. – Jeff Tilton May 02 '17 at 14:59
  • 5
    I've got 13 tabs open searching to solve this same issue and `append=True` is the only thing than worked. Thanks. – KWx Sep 30 '19 at 00:07
  • 1
    Awesome and easy solution! Made my day! – mgruber Apr 07 '20 at 13:09
  • append=True create a few columns with the same name. I am not sure this is the original intent of the OP. – Amnon Jun 28 '20 at 15:54
  • 2
    Why do you say `once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique`? They seem unique to me. – Dr_Zaszuś Jun 17 '21 at 08:31
  • Are column names then _lost_? Like, the column of a DataFrame are `print(modis_albedo_products.columns)`: `Index(['myd21', 'modis_product', 'correlation'], dtype='object')`, then doing `temp = modis_albedo_products.set_index(['myd21', 'modis_product', 'correlation'], append=True)` will give `print(temp.columns)`: `Index([], dtype='object')`. – Nikos Alexandris Aug 20 '21 at 12:17
  • Another thing to watch out for - if at some point you lost your column labels, you'll get this error. – James_SO Apr 25 '23 at 16:11
  • Appending to the index completely defeats the purpose of using a MultiIndex so it can help you group entities. – cengique May 08 '23 at 21:53
8

I had such problem. In my case problem was in data - my column 'information' contained 1 unique value and it caused error

UPDATE: to correct work 'pivot' pairs (id_user,information) cannot have duplicates

It works:

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phon','phon','phone','phone1','phone','phone1','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

it doesn't work:

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phone','phone','phone','phone','phone','phone','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

source: https://stackoverflow.com/a/37021196/6088984

Community
  • 1
  • 1
Grag2015
  • 591
  • 9
  • 15