Why i'm getting this error "ValueError: cannot reindex from a duplicate axis"?

Question

Here i'm just posing the part of the code which is throwing the error. Here i'm concating two different sets of dataframes that are appended in two different list.

path1 = '/home/Desktop/computed_2d_blaze/'
path2 = '/home/Desktop/computed_1d/'
path3 = '/home/Desktop/sn_airmass_seeing/'

dir1 = [x for x in os.listdir(path1) if '.ares' in x]
dir2 = [x for x in os.listdir(path2) if '.ares' in x]
dir3 = [x for x in os.listdir(path3) if '.ares' in x]

lst = []
lst1 = []

for file1, file2,file3 in zip(dir1,dir2,dir3):
   df1 = pd.read_table(path1+file1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
   df2 = pd.read_table(path2+file2, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')

   df1 = df1.groupby('wave').mean().reset_index()
   df1 = df1.sort_values('wave').reset_index(drop=True)
   df2 = df2.sort_values('wave').reset_index(drop=True)

   dfs = pd.merge(df1,df2, on='wave', how='inner')
   dfs['delta_ew'] = (dfs.EWs_x - dfs.EWs_y)
   dfs=dfs.filter(items=['wave','delta_ew'])
   lst.append(dfs)

   df3 = pd.read_table(path3+file3, skiprows=0, usecols=(0,1,2),names=['seeing','airmass','snr'],delimiter=r'\s+')
   lst1.append(df3)

[df.set_index('wave', inplace=True) for df in lst]
df=pd.concat(lst,axis=1,join='inner')

x = pd.concat(lst1,axis=1,join='inner')

for z in df.index:
   t = x.loc[0, 'airmass']
   s = df.loc[z, 'delta_ew']
   dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])
   dfs = dfs[np.abs(dfs.delta_ew - dfs.delta_ews.mean()) <= (dfs.delta_ews.mad())]

As i trying to create a new dataframe as there are some outliers in delta_ew so in order to remove them i'm doing this. But when tried to do this i got this error ValueError: cannot reindex from a duplicate axis.

I don't understand how to solve this error. Can anyone tell me where i'm making mistake?

HERE'S THE FULL TRACEBACK

 Traceback (most recent call last):
  File "/home/gyanender/Desktop/r_values/airmass_vs_ew/delta_ew/for_rvalues.py", line 72, in <module>
    dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 213, in concat
    return op.get_result()
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 385, in get_result
    df = cons(data, index=index)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 330, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 461, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 6168, in _arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 6465, in _homogenize
    v = v.reindex(index, copy=False)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/series.py", line 2681, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3023, in reindex
    fill_value, copy).__finalize__(self)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3041, in _reindex_axes
    copy=copy, allow_dups=False)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3145, in _reindex_with_indexers
    copy=copy)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 4139, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2944, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

Possible duplicate of [What does \`ValueError: cannot reindex from a duplicate axis\` mean?](https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean) — Rarblack, Oct 27 '18 at 19:18
E.g. [Get exception description and stack trace which caused an exception, all as a string](https://stackoverflow.com/questions/4564559/get-exception-description-and-stack-trace-which-caused-an-exception-all-as-a-st) — jpp, Oct 27 '18 at 19:53
i was trying some other things but it's correct now, i've edited it @Rarblack — astroluv, Oct 27 '18 at 20:09
Your error raises form this dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass']) — Rarblack, Oct 27 '18 at 20:10
yeah i know that but i don't understand the reason why it's throwing this error and how i can solve this. — astroluv, Oct 27 '18 at 20:12

score 1 · Answer 1 · answered Oct 28 '18 at 02:56

I finally managed to sort out this problem. Instead of concat i used dictionary. As the problem that i was facing was on concating two pandas series to make new dataframe. I first converted the values of pandas series t & s into dictionary and then converted that dictionary into a dataframe and it worked perfectly fine for me.

for z in df.index:
   t = x.loc[0, 'airmass']
   t = t.values
   s = df.loc[z, 'delta_ew']
   s = s.values
   dic = dict(zip(s,t))      
   q = pd.DataFrame(dic.items(), columns=['ew', 'airmass'])
   q = q[np.abs(q.ew - q.ew.mean()) <= (q.ew.mad())]

Rarblack · Accepted Answer · 2018-10-27T20:33:38.540

-1

This error usually rises when you join / assign to a column when the index has duplicate values.

The error is raised from dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass']) code. I believe that I found the solution to your problem. Just add ignore_index=True to the concat code.

Like this:

dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'], ignore_index=True )

which will re-create the indexes.

Note: index means both row and column names

edited Oct 27 '18 at 20:33

answered Oct 27 '18 at 18:51

Rarblack

4,559
4
22
33

instead of loc try ix such:t = x.ix[0, 'airmass'] s = df.ix[z, 'delta_ew'] – Rarblack Oct 27 '18 at 21:13
Hey, the issue is in this line **dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])** and if i don't use **axis=1** the it concats properly. But my problem is that i want to concat it on **axis=1** – astroluv Oct 27 '18 at 22:22
I knew that and the reason of it when you use axis=1 it changes the indexes as well besides column names – Rarblack Oct 27 '18 at 22:28
sir upvotes don't mater.! I'm stuck on it and I gotta solve this problem. – astroluv Oct 27 '18 at 22:33
I think there's an another way to solve it but don't know if there's any function or something like **q = pd.DataFrame([s,t],ignore_index=True)** that could solve this issue but unfortunately i couldn't find anything. – astroluv Oct 27 '18 at 22:36
yeah i tried it on **pd.concat([s.t],ignore_index=True)** but not on **pd.DataFrame([s.t],ignore_index=True)**. I was thiking of creating a new dataframe instead of concating those two series. But unfortunately even that didn't work. – astroluv Oct 27 '18 at 22:44
You may use merge instead of concat https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging – Rarblack Oct 27 '18 at 22:54
can you post your output when axis is 0 – Rarblack Oct 27 '18 at 23:08
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/182656/discussion-between-astroluv-and-rarblack). – astroluv Oct 27 '18 at 23:12

Why i'm getting this error "ValueError: cannot reindex from a duplicate axis"?

2 Answers2