I am trying to extract the data from xlsx file from the multiple folders and also to get the folder name for each row to identify where the is extracted from. I am able to extract the data from all the folders however i am unable to get the folder names to the dataframe. Please help.
Folder structure -
Month-Year - 2020-02 Day folder - 2020-02-01
Under the day folder consist of xlsx files.
paths = []
arr = []
for root, dirs, files in os.walk(Full_Path):
for file in files:
if file.endswith(".xlsx"):
#print(os.path.join(root, file))
ab = os.path.join(root, file)
print(ab)
arr.append(paths)
paths.append(ab)
for lm in arr:
print(lm)
all_data = pd.DataFrame()
for f in paths:
df = pd.read_excel(f, sheet_name='In Queue', usecols = fields)
df['Date'] = lm
all_data = all_data.append(df,ignore_index=True)
I have also tried different ways but not getting the output.