0

I have probably hundreds or thousands small excel file with bracket into one pandas dataframe

Before I merge them, I need to give flag for which category they come from

Here's my table of reference df

    Dataframe_name      Path                                 Sheet
45  finance_auditing    Finance - Accounting/TopSites-Fin... Aggregated_Data_for_Time_Period
46  finance_lending     Finance - Banking/TopSites-...          Aggregated_Data_for_Time_Period

What I did Dataframe_name name column is filled manually, but what I expected is using refference table

finance_auditing  = pd.read_excel('Finance - Accounting/TopSites-Fin... ','Aggregated_Data_for_Time_Period')
finance_lending   = pd.read_excel('Finance - Banking/TopSites-... ','Aggregated_Data_for_Time_Period')
finance_auditing['Dataframe_name'] = 'finance_auditing'
finance_lending['Dataframe_name'] = 'finance_lending'
dF_all = pd.concat([pd.read_excel(path, sheet_name=sheet) 
           for path, sheet in zip(df.Path, df.Sheet)])

The problem is I have hundreds of of file to read and need to append them all

Nabih Bawazir
  • 6,381
  • 7
  • 37
  • 70
  • Concatenating all excel files into a single df should not be a challenge. But you would need to have a mapping for `Dataframe_name` to every excel file, which I think would be a manual process. Once you have this, all of it can be done in one single for loop. Check [`this`](https://stackoverflow.com/questions/20908018/import-multiple-excel-files-into-python-pandas-and-concatenate-them-into-one-dat) out. – Mayank Porwal May 17 '22 at 06:29
  • Why it should be manual? Already have references table – Nabih Bawazir May 17 '22 at 06:32
  • @NabihBawazir Ah. That's where the confusion was. Didn't realize you have a reference table already created. – Mayank Porwal May 17 '22 at 06:36
  • @MayankPorwal I guess my question not concise enough (before edited) – Nabih Bawazir May 17 '22 at 06:40

1 Answers1

1

This would be fairly simply, you can assign the flag dynamically for each iteration:

pd.concat([pd.read_excel(path, sheet_name=sheet).assign(df_name=name)
                             for name, path, sheet in df.to_numpy()])
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53