0

I have multiple .csv files. they have same column size but different number of rows. I want to make a dataframe which the 3rd dimension shows each file. I tried read each file and save it to a dataframe, then append them to a list, but when convert list to dataframe the output is a two dimension dataframe (if we have 5 files then out puth is (5, 1) dataframe).

path = "Something"
filelist = os.listdir(Path)
print(filelist)
all_csv_files = []
for x in filelist:
    df = pd.read_csv(Path + "\\" + x)
    all_csv_files.append(df)

dataset = pd.DataFrame(all_csv_files)
dataset.shape

Also tried to read each file and save it to a numpy array and stack them (np.stack) but arrays are not the same size. Also pandas.Panel is deprecated.

for example if we have 2 csv file like first one is:

a,b,c,d
a,b,d,c
b,x,y,z

and second one is :

1,2,3,4
2,3,5,4

I want to output be like:

[
  [[a,b,c,d],[a,b,d,c],[a,x,y,z]],
  [[1,2,3,4],[2,3,5,4], [Nan, Nan, Nan, Nan]]
]

which is (2,3,4).

I prefer don't fill Nan but if there is no way it is also ok.

2 Answers2

-1

If you have same columns in all your csv files then you can try the code below. I have added header=0 so that after reading csv first row can be assigned as the column names.

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

You can read this stackoverflow question(Import multiple csv files into pandas and concatenate into one DataFrame) then you can easily dead your scenario.

you can use Asyncio for speed up read all xyz.csv files

Ravi
  • 28
  • 1
  • 9
  • 1
    concat will not add another dimension so it ends up to be a 2d dataframe but concat all rows beside each other. This way out put will be (11994, 815) which 815 is column size and 11994 is sum up of files' rows. (each one around 2k rows.) – vahid bashiri Aug 07 '20 at 07:36
  • @vahidbashiri Please share dummy csv file will try to give solution. – Ravi Aug 07 '20 at 08:50
  • it is typical csv file. I edit quesion to contain dummy csv file. – vahid bashiri Aug 07 '20 at 15:06
-1

You can use np.stack for that

path = "Something"
filelist = os.listdir(Path)
print(filelist)
all_csv_files = []
for x in filelist:
    df = pd.read_csv(Path + "\\" + x)
    dataset = np.stack((df, df))
dataset.shape
cocojambo
  • 63
  • 2
  • 5
  • Please test your solutions before posting. Your loop just stacks one dataframe with itself. And for `np.stack` to work with the OP's example, the dataframes first need to be reshaped to same size. – AlexK Oct 06 '22 at 07:55