0

The purpose of this code is to merge 900 csv files into one single file with unified reshaped date time.

Each file has 5 columns ['DateTime', Actual, Consensus, Previous, Revised] The same column names for every file.

The files are not equal in rows number and for the time steps, for the 900 source files there is 4500 columns in one merged file. here is some sample data:

https://www.fxstreet.com/economic-calendar/event/b2ba798d-1490-4a1b-9a7d-174eeb414926 https://www.fxstreet.com/economic-calendar/event/7a515013-5178-468b-b14b-19351b984e33 https://www.fxstreet.com/economic-calendar/event/7b280ce6-08b3-4b5a-83a7-1a6014d4bab7 https://www.fxstreet.com/economic-calendar/event/08f4deaa-a536-4746-9983-2400a2db4722 https://www.fxstreet.com/economic-calendar/event/4bd3c8bc-c6d4-4f02-bed8-28d0caae7beb https://www.fxstreet.com/economic-calendar/event/86ab275d-83f3-4760-a9f3-3a51f842029f https://www.fxstreet.com/economic-calendar/event/6b0f63fa-277c-458c-b7d1-3c0f2963deaa https://www.fxstreet.com/economic-calendar/event/69921d2c-072f-47df-ba4f-3cbf0efaa293 https://www.fxstreet.com/economic-calendar/event/4e7f2b96-19aa-46f9-8f4a-403f2334fadb https://www.fxstreet.com/economic-calendar/event/a0d1effc-9698-477f-9da6-41db2f7a1dbe https://www.fxstreet.com/economic-calendar/event/7a262107-aa0b-478c-a0c4-49b2a8589738 https://www.fxstreet.com/economic-calendar/event/42ead1f1-30c2-459d-84ed-4a2b0deaba8a https://www.fxstreet.com/economic-calendar/event/015035d6-7dfe-4bb0-a138-5362dbcdf309 https://www.fxstreet.com/economic-calendar/event/30d4dcf7-be62-4401-8cdd-96d1a8a989da https://www.fxstreet.com/economic-calendar/event/dad82d0d-0561-4275-bdea-56f8e3709581 https://www.fxstreet.com/economic-calendar/event/d080d54a-58ac-49fc-9463-589260fdc6de https://www.fxstreet.com/economic-calendar/event/70d803d1-04e2-496b-a565-5fe08e89cc48 https://www.fxstreet.com/economic-calendar/event/04e37486-f4aa-4a54-875b-62fae3b62486 https://www.fxstreet.com/economic-calendar/event/0205d838-1106-4d7d-abdd-692f33fb5686 https://www.fxstreet.com/economic-calendar/event/727dc859-0e7d-4209-8f28-70acb2ea2d61 https://www.fxstreet.com/economic-calendar/event/14dce315-f073-4ffb-9205-95c78e42929c https://www.fxstreet.com/economic-calendar/event/3a35946f-7a82-4e4f-9582-9c61676eecb3 https://www.fxstreet.com/economic-calendar/event/98bb2374-b9f9-46ae-93e3-9f7e8a4391c1 https://www.fxstreet.com/economic-calendar/event/a5d475bb-27fa-4a82-90ba-9fc271c5b9bd

Here is the code:

import os
import pandas as pd

os.chdir('E:\\Business\\Economic Indicators')
dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
       for f in os.listdir(os.getcwd()) if f.endswith('csv')]

finaldf = pd.concat(dfs, axis=1, join='outer').sort_index(ascending=False)
finaldf = finaldf.loc[~finaldf.index.duplicated(keep='first')]

print(finaldf.head())
finaldf.to_csv('finaldf.csv')

but i've got this error:

Traceback (most recent call last):
File "E:/Tutorial/Driver/phantomjs-2.1.1-windows/bin/adaa.py", line 8, in <module>
    finaldf = pd.concat(dfs, axis=1, join='outer').sort_index(ascending=False)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 212, in concat
    copy=copy)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 363, in __init__
    self.new_axes = self._get_new_axes()
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 430, in _get_new_axes
    new_axes[i] = self._get_comb_axis(i)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 450, in _get_comb_axis
    intersect=self.intersect)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\api.py", line 42, in _get_objs_combined_axis
    return _get_combined_index(obs_idxes, intersect=intersect)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\api.py", line 57, in _get_combined_index
    union = _union_indexes(indexes)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\api.py", line 84, in _union_indexes
    return result.union_many(indexes[1:])
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\datetimes.py", line 1054, in union_many
    this = Index.union(this, other)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2239, in union
    indexer = self.get_indexer(other)
File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2687, in get_indexer
    raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Sayed Gouda
  • 605
  • 3
  • 9
  • 22
  • @SCB thanks for edit – Sayed Gouda Jan 02 '18 at 08:24
  • Check [this question](https://stackoverflow.com/questions/35084071/concat-dataframe-reindexing-only-valid-with-uniquely-valued-index-objects). – Cleb Jan 02 '18 at 08:31
  • @Cleb same error after edit – Sayed Gouda Jan 02 '18 at 15:16
  • I did not read the things in detail but it looks like in the other answer, the `duplicated` part is ran before the `concat`. Have you tried that? Check also the `reset_index` solution. – Cleb Jan 02 '18 at 16:01
  • @Cleb it is in the original code – Sayed Gouda Jan 02 '18 at 16:02
  • Where? You concat first and the run the `duplicate` stuff afterwards; in the other solution it is the other way round if I see that correctly. – Cleb Jan 02 '18 at 16:08
  • I can’t do his way because I opened all files in one line of code. But In his way, he opened each file at a time, I can’t do that its around 900 files – Sayed Gouda Jan 02 '18 at 16:14
  • You could just run another list comprehension where you use the `duplicate` or `reset_index` part, I think. – Cleb Jan 02 '18 at 16:15
  • sorry but your idea is not clear for me, can you give me an example to make me understand it right. I mean it is 900 files each file has 5 columns, so I can't figure out the (map, filter…etc.) arguments to make right. – Sayed Gouda Jan 02 '18 at 16:18
  • It is a bit tricky as you don't provide any data :) I guess something like `dfs = [dfi.loc[~dfi.index.duplicated(keep='first')] for dfi in dfs]` prior to `concat` could(!) work. As written, difficult to say without any data. – Cleb Jan 02 '18 at 16:22
  • 1
    i add some data in the post – Sayed Gouda Jan 02 '18 at 16:25
  • still no answer – Sayed Gouda Jan 02 '18 at 20:52
  • As one can still not simply copy-and-paste your data. People here are not paid so it is up to the person who asks the question to make it as easy as possible to answer it. Did you try my line from above? Could you not just copy three of the dataframes above and post them here so that one can just copy them? – Cleb Jan 02 '18 at 21:08

0 Answers0