python pandas merge multiple csv InvalidIndexError

Question

The purpose of this code is to merge 900 csv files into one single file with unified reshaped date time.

Each file has 5 columns ['DateTime', Actual, Consensus, Previous, Revised] The same column names for every file.

The files are not equal in rows number and for the time steps, for the 900 source files there is 4500 columns in one merged file. here is some sample data:

Here is the code:

import os
import pandas as pd

os.chdir('E:\\Business\\Economic Indicators')
dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
       for f in os.listdir(os.getcwd()) if f.endswith('csv')]

finaldf = pd.concat(dfs, axis=1, join='outer').sort_index(ascending=False)
finaldf = finaldf.loc[~finaldf.index.duplicated(keep='first')]

print(finaldf.head())
finaldf.to_csv('finaldf.csv')

but i've got this error:

Traceback (most recent call last):
File "E:/Tutorial/Driver/phantomjs-2.1.1-windows/bin/adaa.py", line 8, in <module>
    finaldf = pd.concat(dfs, axis=1, join='outer').sort_index(ascending=False)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 212, in concat
    copy=copy)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 363, in __init__
    self.new_axes = self._get_new_axes()
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 430, in _get_new_axes
    new_axes[i] = self._get_comb_axis(i)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 450, in _get_comb_axis
    intersect=self.intersect)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\api.py", line 42, in _get_objs_combined_axis
    return _get_combined_index(obs_idxes, intersect=intersect)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\api.py", line 57, in _get_combined_index
    union = _union_indexes(indexes)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\api.py", line 84, in _union_indexes
    return result.union_many(indexes[1:])
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\datetimes.py", line 1054, in union_many
    this = Index.union(this, other)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2239, in union
    indexer = self.get_indexer(other)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2687, in get_indexer
    raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Check [this question](https://stackoverflow.com/questions/35084071/concat-dataframe-reindexing-only-valid-with-uniquely-valued-index-objects). — Cleb, Jan 02 '18 at 08:31
I did not read the things in detail but it looks like in the other answer, the `duplicated` part is ran before the `concat`. Have you tried that? Check also the `reset_index` solution. — Cleb, Jan 02 '18 at 16:01
Where? You concat first and the run the `duplicate` stuff afterwards; in the other solution it is the other way round if I see that correctly. — Cleb, Jan 02 '18 at 16:08
I can’t do his way because I opened all files in one line of code. But In his way, he opened each file at a time, I can’t do that its around 900 files — Sayed Gouda, Jan 02 '18 at 16:14
You could just run another list comprehension where you use the `duplicate` or `reset_index` part, I think. — Cleb, Jan 02 '18 at 16:15
sorry but your idea is not clear for me, can you give me an example to make me understand it right. I mean it is 900 files each file has 5 columns, so I can't figure out the (map, filter…etc.) arguments to make right. — Sayed Gouda, Jan 02 '18 at 16:18
It is a bit tricky as you don't provide any data :) I guess something like `dfs = [dfi.loc[~dfi.index.duplicated(keep='first')] for dfi in dfs]` prior to `concat` could(!) work. As written, difficult to say without any data. — Cleb, Jan 02 '18 at 16:22
As one can still not simply copy-and-paste your data. People here are not paid so it is up to the person who asks the question to make it as easy as possible to answer it. Did you try my line from above? Could you not just copy three of the dataframes above and post them here so that one can just copy them? — Cleb, Jan 02 '18 at 21:08

python pandas merge multiple csv InvalidIndexError

0 Answers0