Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape

Question

I am trying to unstack a multi-index with pandas and I am keep getting:

ValueError: Index contains duplicate entries, cannot reshape

Given a dataset with four columns:

id (string)
date (string)
location (string)
value (float)

I first set a three-level multi-index:

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)

In [38]: e
Out[38]: 
                                    value
id           date       location       
id1          2014-12-12 loc1        16.86
             2014-12-11 loc1        17.18
             2014-12-10 loc1        17.03
             2014-12-09 loc1        17.28

Then I try to unstack the location:

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
    143 
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147 

ValueError: Index contains duplicate entries, cannot reshape

What is going on here?

Are there any NaNs in your data? There was [an unstack bug related to those](https://github.com/pydata/pandas/issues/4862). — BrenBarn, Feb 21 '15 at 20:45
What version of pandas are you using? Could you give a small example DataFrame which demonstrates this issue (so we can just run your code and see if we get the same result)? — Andy Hayden, Feb 21 '15 at 20:57
@AndyHayden I am using pandas 0.15.2. I will try to create a dataframe that reproduces the result, but so far, I have been unable to do so. — ARF, Feb 21 '15 at 21:28
This is a very old question, however my two cents.. I was getting this error and while checking the data I found there are duplicates data in my excel cell. I was trying to break multiple rows in a cell to individual rows. Once I remove duplicates, this error is not anymore — BigData-Guru, Jul 29 '19 at 16:24
Just to add to last comment, `df.drop_duplicates(['id','date','location'])` would remove the duplicates so that the operation can proceed. Solution from https://stackoverflow.com/a/13833239/1769163 — cengique, May 08 '23 at 21:55

score 76 · Accepted Answer · answered Feb 21 '15 at 22:29

Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?

In [11]: df
Out[11]:
   0  1  2      3
0  1  2  a  16.86
1  1  2  a  17.18
2  1  4  a  17.03
3  2  5  b  17.28

In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')  # desired?
Out[12]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

In [13]: df1 = df.set_index([0, 1, 2])

In [14]: df1
Out[14]:
           3
0 1 2
1 2 a  16.86
    a  17.18
  4 a  17.03
2 5 b  17.28

In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape

One solution is to reset_index (and get back to df) and use pivot_table.

In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...

I had a large dataframe that I was fairly sure had the correct indices. Turned out that some of the sub-dataframes were being created incorrectly, and indices were duplicated. `np.where(df.index.duplicated())` is a real help here. — naught101, Apr 21 '17 at 04:04

score 70 · Answer 2 · answered Jan 18 '16 at 13:14

70

There's a far more simpler solution to tackle this.

The reason why you get ValueError: Index contains duplicate entries, cannot reshape is because, once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique.

You can avoid this by retaining the default index column (row #) and while setting the index using "id", "date" and "location", add it in "append" mode instead of the default overwrite mode.

So use,

e.set_index(['id', 'date', 'location'], append=True)

Once this is done, your index columns will still have the default index along with the set indexes. And unstack will work.

Let me know how it works out.

answered Jan 18 '16 at 13:14

HVS

2,427
3
21
19

6

2 hours trying to figure this out, couldn't believe it could be as hard as some of the posts I've seen. – Jeff Tilton May 02 '17 at 14:59
5

I've got 13 tabs open searching to solve this same issue and `append=True` is the only thing than worked. Thanks. – KWx Sep 30 '19 at 00:07
1

Awesome and easy solution! Made my day! – mgruber Apr 07 '20 at 13:09
append=True create a few columns with the same name. I am not sure this is the original intent of the OP. – Amnon Jun 28 '20 at 15:54
2

Why do you say `once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique`? They seem unique to me. – Dr_Zaszuś Jun 17 '21 at 08:31
Are column names then _lost_? Like, the column of a DataFrame are `print(modis_albedo_products.columns)`: `Index(['myd21', 'modis_product', 'correlation'], dtype='object')`, then doing `temp = modis_albedo_products.set_index(['myd21', 'modis_product', 'correlation'], append=True)` will give `print(temp.columns)`: `Index([], dtype='object')`. – Nikos Alexandris Aug 20 '21 at 12:17
Another thing to watch out for - if at some point you lost your column labels, you'll get this error. – James_SO Apr 25 '23 at 16:11
Appending to the index completely defeats the purpose of using a MultiIndex so it can help you group entities. – cengique May 08 '23 at 21:53

score 8 · Answer 3 · edited May 23 '17 at 12:10

I had such problem. In my case problem was in data - my column 'information' contained 1 unique value and it caused error

UPDATE: to correct work 'pivot' pairs (id_user,information) cannot have duplicates

It works:

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phon','phon','phone','phone1','phone','phone1','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

it doesn't work:

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phone','phone','phone','phone','phone','phone','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

source: https://stackoverflow.com/a/37021196/6088984

The not working scenario works if it contains unique values for id_user. — Nuwan Abeynayake, Oct 11 '22 at 11:31

Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape

3 Answers3

Linked