I'm new to Python and having some trouble looping all the files in my directory.
I am trying to import data from all Excel files from all of the subfolders I have in one single directory. For example, I have a directory named "data" which has five different subfolders and each subfolder contains Excel files from which I want to extract data.
I guess my current code is not working because it just loops all the files in a directory without considering the subfolders. How do I modify my current code to extract data from all the subfolders in my directory?
data_location = "data/"
for file in os.listdir(data_location):
df_file = pd.read_excel(data_location + file)
df_file.set_index(df_file.columns[0], inplace=True)
selected_columns = df_file.loc["Country":"Score", :'Unnamed: 1']
selected_columns.dropna(inplace=True)
df_total = pd.concat([selected_columns, df_total], ignore_index=True)
Also, I've been trying to create a new variable using each file name as I import them. For example, if there are 5 files(file1~file5) in a directory, I want to create a new variable called "Source" and each value would be file1, file2, file3, file4, file5. I want python to append this value for the new variable as it imports each file in the loop. Could anyone please help me with this?