0

I have a folder with several .csv-files. Each contains data on Time, High, Low, Open, Volumefrom, Volumeto, Close of a cryptocurrency.

I managed to load the .csvs into a list of dataframes and drop the columns Open, High, Low, Volumefrom, Volumeto , which I don't need, leaving me with Time and Close for each dataframe.

Now i want to combine the list of dataframes into one dataframe, where the index starts with the Timestamp of the youngest coin which would be iota in this example.

Currency Data Frames

This is the code I wrote so far:

import pandas as pd
import os

# Path to my folder
PATH_COINS = r"C:\Users\...\Coins"

# creating a path for each of the .csv-files and saving it into a list
namelist = [name for name in os.listdir(PATH_COINS)]
path_lists = [os.path.join(PATH_COINS, path) for path in namelist]

# creating the dataframes and saving them into a list
dfs = [pd.read_csv(k, index_col=0) for k in path_lists]

# dropping unwanted columns 
for num, i in enumerate(dfs):
    i.drop(columns=["Open", "High", "Low", "Volumefrom", "Volumeto"], inplace=True)

# combining the list of dataframes into one dataframe     
pd.concat(dfs, join="inner", axis=1)

However i am getting an Errormessage and cant figure out how to achieve my goal:

Traceback (most recent call last): File "C:/Users/Jonas/PycharmProjects/Pandas/main.py", line 16, in pd.concat(dfs, join="inner", axis=1)

File "C:\Users\Jonas\PycharmProjects\Pandas\venv\lib\site-packages\pandas\core\reshape\concat.py", line 226, in concat return op.get_result()

File "C:\Users\Jonas\PycharmProjects\Pandas\venv\lib\site-packages\pandas\core\reshape\concat.py", line 423, in get_result copy=self.copy)

File "C:\Users\Jonas\PycharmProjects\Pandas\venv\lib\site-packages\pandas\core\internals.py", line 5425, in concatenate_block_managers return BlockManager(blocks, axes)

File "C:\Users\Jonas\PycharmProjects\Pandas\venv\lib\site-packages\pandas\core\internals.py", line 3282, in init self._verify_integrity()

File "C:\Users\Jonas\PycharmProjects\Pandas\venv\lib\site-packages\pandas\core\internals.py", line 3493, in _verify_integrity construction_error(tot_items, block.shape[1:], self.axes)

File "C:\Users\Jonas\PycharmProjects\Pandas\venv\lib\site-packages\pandas\core\internals.py", line 4843, in construction_error passed, implied))

ValueError: Shape of passed values is (5, 8514), indices imply (5, 8490)

Parfait
  • 104,375
  • 17
  • 94
  • 125
Jones
  • 9
  • 1
  • 1
    You probably have duplicate index values in at least one of the DataFrames. If you see any `False` values in the output of `[df.index.is_unique for df in dfs]`, that is likely the source of this error. – Peter Leimbigler Sep 25 '18 at 23:27
  • yes, i had duplicates in my .csv files. I didnt even think of that, because i got the data from an api and was expecting it to be cleaned. Thank you very much! – Jones Sep 26 '18 at 17:03

1 Answers1

1

join should work

Check for duplicate index values as it doesn't know how to map multiple duplicate indexes across multiple DFs (e.g. df.index.is_unique)

Removing duplicate index values (e.g., df.drop_duplicates(inplace=True)) or one of the methods here should resolve it.

Turtalicious
  • 430
  • 2
  • 5