Importing data from multiple files using python

Question

I have to make an application in which I have to import all the excel files in the given folder and add it to a dataframe. The dataframe should look as shown:

Expected Data Frame

As seen in the image one of the columns for the dataframe is the name of the file.

I have successfully added that column in the final dataframe and the code is as follows:


import pandas as pd
import os
import shutil
import re  
path = 'C:/Users/Administrator/Desktop/Zerodha/Day2'
lst = os.listdir(path)
files = [os.path.join(path,x) for x in lst]
print(lst)
dataframes_lst = []

for file in files:
 filename = file.split('/')[-1]
 dataframe = pd.read_csv(file, usecols=[0,4], names ["date",filename],index_col=["date"])
dataframes_lst.append(dataframe)

df = pd.concat(dataframes_lst, axis=1)

print(df)


df.to_csv('data.csv')

The dataframe which is obtained using this code is as displayed:

For reference I will attach the snippet of one of the excel files:

Excel snippet

Also as seen there are many nan values obtained. I tried to remove them by using pd.dropna(inplace = True) function and also by doing as suggested in this post:

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

But the resultant dataframe still contains the nan values.

I have solved the nan issue by using the fillna function – Huzefa Sadikot Nov 19 '20 at 09:22 — Huzefa Sadikot, Nov 19 '20 at 09:22

Orkun Berk Yuzbasioglu · Accepted Answer · 2020-11-18T10:14:09.587

Regarding

My doubt is that how do I loop through all the files in the directory and extract data of each file in the required format

You can loop through all the files in the directory and extract the data and filename as the header of the dataframe as such:

import pandas as pd
import os

path = './data'
lst = os.listdir('./data/')
files = [os.path.join(path,el) for el in lst]

and the structure of example.xlsx is:

dataframes_lst = []

for file in files:
    filename = file.split('/')[-1]
    dataframe = pd.read_excel(file, usecols=[3], names=[filename])
    dataframes_lst.append(dataframe)

df = pd.concat(dataframes_lst, axis=1)
print(df)

Here, the dataframes are concatenated along the axis=1 and the output of print(df)is

Thanks for your time. This is the solution I was looking for, — Huzefa Sadikot, Nov 18 '20 at 10:08

score 0 · Answer 2 · answered Nov 18 '20 at 09:20

0

Try this:

import pandas as pd    
from pathlib import Path
read_path = Path('C:/Users/Administrator/Desktop/Zerodha/Day2')
    
df = pd.concat([pd.read_csv(path) for path in read_path.glob('*.csv')])

If you want to read from excel just use read_excel and change the pattern to '*.xlsx'

answered Nov 18 '20 at 09:20

skarit

11
1
3

I am getting the data but not as expected in the Expected Data Frame. Your code gives me the data in the format mentioned in the Excel snippet. I need it in the format as mentioned in the Expected Data Frame – Huzefa Sadikot Nov 18 '20 at 09:33

Importing data from multiple files using python

2 Answers2