I have to make an application in which I have to import all the excel files in the given folder and add it to a dataframe. The dataframe should look as shown:
As seen in the image one of the columns for the dataframe is the name of the file.
I have successfully added that column in the final dataframe and the code is as follows:
import pandas as pd
import os
import shutil
import re
path = 'C:/Users/Administrator/Desktop/Zerodha/Day2'
lst = os.listdir(path)
files = [os.path.join(path,x) for x in lst]
print(lst)
dataframes_lst = []
for file in files:
filename = file.split('/')[-1]
dataframe = pd.read_csv(file, usecols=[0,4], names ["date",filename],index_col=["date"])
dataframes_lst.append(dataframe)
df = pd.concat(dataframes_lst, axis=1)
print(df)
df.to_csv('data.csv')
The dataframe which is obtained using this code is as displayed:
For reference I will attach the snippet of one of the excel files:
Also as seen there are many nan values obtained. I tried to remove them by using pd.dropna(inplace = True) function and also by doing as suggested in this post:
But the resultant dataframe still contains the nan values.