1

Here i'm just posing the part of the code which is throwing the error. Here i'm concating two different sets of dataframes that are appended in two different list.

path1 = '/home/Desktop/computed_2d_blaze/'
path2 = '/home/Desktop/computed_1d/'
path3 = '/home/Desktop/sn_airmass_seeing/'

dir1 = [x for x in os.listdir(path1) if '.ares' in x]
dir2 = [x for x in os.listdir(path2) if '.ares' in x]
dir3 = [x for x in os.listdir(path3) if '.ares' in x]

lst = []
lst1 = []

for file1, file2,file3 in zip(dir1,dir2,dir3):
   df1 = pd.read_table(path1+file1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
   df2 = pd.read_table(path2+file2, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')

   df1 = df1.groupby('wave').mean().reset_index()
   df1 = df1.sort_values('wave').reset_index(drop=True)
   df2 = df2.sort_values('wave').reset_index(drop=True)

   dfs = pd.merge(df1,df2, on='wave', how='inner')
   dfs['delta_ew'] = (dfs.EWs_x - dfs.EWs_y)
   dfs=dfs.filter(items=['wave','delta_ew'])
   lst.append(dfs)

   df3 = pd.read_table(path3+file3, skiprows=0, usecols=(0,1,2),names=['seeing','airmass','snr'],delimiter=r'\s+')
   lst1.append(df3)

[df.set_index('wave', inplace=True) for df in lst]
df=pd.concat(lst,axis=1,join='inner')

x = pd.concat(lst1,axis=1,join='inner')

for z in df.index:
   t = x.loc[0, 'airmass']
   s = df.loc[z, 'delta_ew']
   dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])
   dfs = dfs[np.abs(dfs.delta_ew - dfs.delta_ews.mean()) <= (dfs.delta_ews.mad())]

As i trying to create a new dataframe as there are some outliers in delta_ew so in order to remove them i'm doing this. But when tried to do this i got this error ValueError: cannot reindex from a duplicate axis.

I don't understand how to solve this error. Can anyone tell me where i'm making mistake?

HERE'S THE FULL TRACEBACK

 Traceback (most recent call last):
  File "/home/gyanender/Desktop/r_values/airmass_vs_ew/delta_ew/for_rvalues.py", line 72, in <module>
    dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 213, in concat
    return op.get_result()
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 385, in get_result
    df = cons(data, index=index)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 330, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 461, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 6168, in _arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 6465, in _homogenize
    v = v.reindex(index, copy=False)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/series.py", line 2681, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3023, in reindex
    fill_value, copy).__finalize__(self)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3041, in _reindex_axes
    copy=copy, allow_dups=False)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3145, in _reindex_with_indexers
    copy=copy)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 4139, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2944, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
Rarblack
  • 4,559
  • 4
  • 22
  • 33
astroluv
  • 798
  • 1
  • 8
  • 25

2 Answers2

1

I finally managed to sort out this problem. Instead of concat i used dictionary. As the problem that i was facing was on concating two pandas series to make new dataframe. I first converted the values of pandas series t & s into dictionary and then converted that dictionary into a dataframe and it worked perfectly fine for me.

for z in df.index:
   t = x.loc[0, 'airmass']
   t = t.values
   s = df.loc[z, 'delta_ew']
   s = s.values
   dic = dict(zip(s,t))      
   q = pd.DataFrame(dic.items(), columns=['ew', 'airmass'])
   q = q[np.abs(q.ew - q.ew.mean()) <= (q.ew.mad())]
astroluv
  • 798
  • 1
  • 8
  • 25
-1

This error usually rises when you join / assign to a column when the index has duplicate values.

The error is raised from dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass']) code. I believe that I found the solution to your problem. Just add ignore_index=True to the concat code.

Like this:

dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'], ignore_index=True )

which will re-create the indexes.

Note: index means both row and column names

Rarblack
  • 4,559
  • 4
  • 22
  • 33
  • instead of loc try ix such:t = x.ix[0, 'airmass'] s = df.ix[z, 'delta_ew'] – Rarblack Oct 27 '18 at 21:13
  • Hey, the issue is in this line **dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])** and if i don't use **axis=1** the it concats properly. But my problem is that i want to concat it on **axis=1** – astroluv Oct 27 '18 at 22:22
  • I knew that and the reason of it when you use axis=1 it changes the indexes as well besides column names – Rarblack Oct 27 '18 at 22:28
  • sir upvotes don't mater.! I'm stuck on it and I gotta solve this problem. – astroluv Oct 27 '18 at 22:33
  • I think there's an another way to solve it but don't know if there's any function or something like **q = pd.DataFrame([s,t],ignore_index=True)** that could solve this issue but unfortunately i couldn't find anything. – astroluv Oct 27 '18 at 22:36
  • yeah i tried it on **pd.concat([s.t],ignore_index=True)** but not on **pd.DataFrame([s.t],ignore_index=True)**. I was thiking of creating a new dataframe instead of concating those two series. But unfortunately even that didn't work. – astroluv Oct 27 '18 at 22:44
  • You may use merge instead of concat https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging – Rarblack Oct 27 '18 at 22:54
  • can you post your output when axis is 0 – Rarblack Oct 27 '18 at 23:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/182656/discussion-between-astroluv-and-rarblack). – astroluv Oct 27 '18 at 23:12