I have a dataframe called 'bal'. It looks like this:
ano id unit period
business_id
9564 2012 302 sdasd anual
9564 2011 303 sdasd anual
2361 2013 304 sdasd anual
2361 2012 305 sdasd anual
...
I'm running the following code on it:
bal=bal.merge(bal.pivot(columns='ano', values='id'),right_index=True,left_index=True)
My intention is to turn that into something like this:
ano id unit period 2006 2007 2008 2009 2010 \
business_id
72 2013 774 sdasd anual NaN NaN NaN NaN NaN
72 2012 775 sdasd anual NaN NaN NaN NaN NaN
74 2012 1120 sdasd anual NaN NaN NaN NaN NaN
119 2013 875 sdasd anual NaN NaN NaN NaN NaN
119 2012 876 sdasd anual NaN NaN NaN NaN NaN
...
When I that code, I get this error:
ValueError: Index contains duplicate entries, cannot reshape
So to avoid duplicates, I added a drop_duplicates line:
bal=bal.drop_duplicates()
bal=bal.merge(bal.pivot(columns='ano', values='id'),right_index=True,left_index=True)
When I run the code, voilá, I get the same problem:
ValueError: Index contains duplicate entries, cannot reshape
Am I doing something wrong or misunderstanding something?
EDIT
bal
is a dataframe I'm creating out of a SQL using the following code:
bal=pd.read_sql('select * from table;',connection).set_index('business_id')[['ano','id','unit','period']]
The weird thing is that if I limit the SQL query it works fine:
bal=pd.read_sql('select * from table limit 1000;',connection).set_index('business_id')[['ano','id','unit','period']]
I thought that the problem could be related to the fact that the index has a lot of duplication (as you can see in that example above). However if I print(bal.head(4))
in this limited bal it looks exactly the same as the one you can see above, with indexes that repeat.