-1

I have some data of 50 people in 50 different excel files placed in the same folder. For each person the data is present in five different files like shown below:

Example: Person1_a.xls, Person1_b.xls, Person1_c.xls, Person1_d.xls, Person1_e.xls.

Each excel sheet has two columns and multiple sheets. I need to create a file Person1.xls which will have the second column of all these files, combined. Same process should be applicable for all the 50 people.

Any suggestions would be appreciated.

Thank you!

sophocles
  • 13,593
  • 3
  • 14
  • 33
lv8185
  • 15
  • 4
  • 1
    What have you tried? what errors have you had? I think the first step is to parse all the files you can do that using the `os` or `pathlib` module – Umar.H Jan 28 '21 at 10:06

1 Answers1

1

I have created a trial folder that I believe is similar to yours. I added data only for Person1 and Person3.

In the attached picture, the files called Person1 and Person3 are the exported files that include only the 2nd column for each person. So each person has their own file now.

enter image description here

I added a small description on what each line does. Please let me know if something is not clear.

import pandas as pd
import glob

path = r'C:\..\trial' # use your path where the files are
all_files = glob.glob(path + "/*.xlsx") # will get you all files with an extension .xlsx in a folder

li = []
for i in range(0,51): # numbers from 1 to 50 (for the 50 different people)
    for f in all_files:
        if str(i) in f: # checks if the number (i) is in the excel name
            df = pd.read_excel(f,
                                 sheet_name=0, # import 1st sheet
                                 usecols=([1])) # only import column 2
            df['person'] = f.rsplit('\\',1)[1].split('_')[0] # get the name of the person in a column
            li.append(df) # add it to the list of dataframes

all_person = pd.concat(li, axis=0, ignore_index=True)  # concat all dataframes imported      

Then you can export to the same path, a different excel file for each different person

for i,j in all_person.groupby('person'):
    j.to_excel(f'{path}\{i}.xlsx', index = False)

I am aware that this is probably not the most efficient way, but it will probably get you what you need.

sophocles
  • 13,593
  • 3
  • 14
  • 33