-1

I'm having a little issue with the following code:

def process_row(row):
    liens = row['Liens']
    df_excel = pd.read_excel(liens)
    df_excel['Name'] = row['Name']
    print(df_excel)
    return df_excel

result_df = new_df.apply(process_row, axis=1)

new_df is a dataframe where 'Liens' is a column with multiple windows link to excel such as C:/Documents/.../test.xlsx When printing df_excel it does display the data in the excel file correctly, but not in result_df, it returns me a serie that is different from excel_df and I have no idea why... I would like to have a dataframe result_df with all the data from the multiple excel file regrouped.

Thanks for your help!!

HugoLny
  • 11
  • 5
  • "it returns me a serie" -- what does this mean? – Scott Hunter Jan 04 '23 at 15:41
  • Sorry just corrected, i meant that what it returns is a serie but completely different from what is printed in df_excel – HugoLny Jan 04 '23 at 15:44
  • 2
    Please take a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit] your question to show a sample input, and current vs expected output to make a [mcve] so that we can better understand your question – G. Anderson Jan 04 '23 at 15:50

1 Answers1

0

If your goal is to read a bunch of excel files based on information in a dataframe and combine the resulting dataframes, something like this should do the trick:

import pandas as pd


def main():

    new_df = pd.DataFrame({"col_a": ["a", "b", "c"]})

    def process_row(row) -> pd.DataFrame:

        other_df = pd.DataFrame({"col_b": ["d", "e", "f"]})
        other_df["col_a"] = row.col_a

        return other_df

    # Iterate over all rows, process the row and concat the resulting dataframes
    result_df = pd.concat(
        map(process_row, new_df.itertuples()),
        ignore_index=True
    )

    print(result_df)


if __name__ == "__main__":
    main()

The output is as follows:

  col_b col_a
0     d     a
1     e     a
2     f     a
3     d     b
4     e     b
5     f     b
6     d     c
7     e     c
8     f     c

You can substitute your process_row and dataframe here.

Maurice
  • 11,482
  • 2
  • 25
  • 45