0

I have one excel file that has multiple sheets that share some columns and rows (all numeric data). I found this good question but that not really answer my question, Import multiple excel sheets from different files into python and concatenate them into one dataframe

I am looking for a way to loop through all the sheets located in one single excel file and combine them into one table by summing the shared columns and rows.

For example, one sheet:

  Subject   Modifier    Tech_A   Chart_A  Unknown_B  Tech_B
    A         Tech_A      1        4         1         2
    A         Chart_A     3        3         2         4
    B         Unknown_B   4        2         4         5
    B         Tech_B      5        1         3         3

Another sheet,

  Subject   Modifier    Tech_A  Chart_A  Unknown_A Unknown_B  Tech_B  Chart_B
    A         Tech_A      1        4         1         2        1       1
    A         Chart_A     3        3         2         4        3       2
    A         Unknown_A   1        2         5         3        3       4
    B         Unknown_B   4        2         4         5        5       3
    B         Tech_B      5        1         3         3        2       5
    B         Chart_B     1        1         2         4        2       0

The combined single file will merge all the data and sum the similar ones,

  Subject   Modifier    Tech_A  Chart_A  Unknown_A Unknown_B  Tech_B  Chart_B
    A         Tech_A      2        8         1         3        3       1
    A         Chart_A     6        6         2         6        7       2
    A         Unknown_A   1        2         5         3        3       4
    B         Unknown_B   8        4         4         9        10      3
    B         Tech_B      10       2         3         6        6       5
    B         Chart_B     1        1         2         4        2       0

How can I achieve this in python?

Thanks in advance

Amal Nasir
  • 164
  • 15

1 Answers1

1

try something like this:

df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True)
df = df.groupby('Modifier').sum()

According to the documentation, specifing sheet_name to 'None' you get all sheets.

slb20
  • 127
  • 1
  • 7
  • This will put everything together. The data is huge and this will put everything there. Is there a way to loop through the sheets and sum while adding them? – Amal Nasir May 26 '21 at 19:01
  • i've edited my answer. By executing ```groupby``` and summarizing the values you should get to what you desire. – slb20 May 27 '21 at 08:57
  • This calculated the sum. But for somehow I got duplicate columns and those duplicate has different values. Thank you for help! – Amal Nasir May 27 '21 at 21:05