0

INPUT excel file:

PRODUCTPRICE    TIMEPLACED
2       14-JUN-18 02.53.33.347000 AM
3       14-JUN-18 03.05.15.210000 AM
4       14-JUN-18 02.26.11.959000 AM
30      14-JUL-18 02.53.33.347000 AM
20      14-JUL-18 03.05.15.210000 AM
20      14-MAR-19 02.53.33.347000 AM
10      14-MAR-18 03.05.15.210000 AM
30      14-MAR-18 02.26.11.959000 AM

output excel file.

TOTAL_PRODUCT_PRICE  DATE
9                   JUN-18
50                  JUL-18
60                  MAR-20



PRODUCTPRICE    TIMEPLACED
    2       14-JUN-18 02.53.33.347000 AM
    3       14-JUN-18 03.05.15.210000 AM
    4       14-JUN-18 02.26.11.959000 AM
    30      14-JUL-18 02.53.33.347000 AM
    20      14-JUL-18 03.05.15.210000 AM
    20      14-MAR-19 02.53.33.347000 AM
    10      14-MAR-18 03.05.15.210000 AM
    30      14-MAR-18 02.26.11.959000 AM

I will get input excel file like above now I want to write new excel by calculating total_product for each month and also want same details as in input excel below in new excel as well.what is the best method to achieve the result?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
user739115
  • 1,117
  • 5
  • 20
  • 41

1 Answers1

0

This would be my approach:

Step 0: Try to get a better input format. Excel is not suitable for data processing.

Step 1. Read the original excel:
If step 0 fails, I would use pandas' read_excel function to get a DataFrame:

import pandas as pd
df_original = pd.read_excel('./your_input_excel.xlsx')

You likely will have to do date parsing. The parse_dates and date_parser parameters of read_excel could be of help. Documentation can be found over here: read_excel docs

Step 2. Store the original excel for later:
After having parsed your dates in your DataFrame, make a copy of it:

df_processing = df_original.copy()

Step 3. Sum the prices that have the same month-year:
With the df_processing DataFrame, you can sum all the values in the PRODUCTPRICE columns if they have the same month and year and assign the result to a TOTAL_PRODUCT_PRICE column. You can create a DATE column by only keeping the month and year information from TIMEPLACED.

Now you can drop the old PRODUCTPRICE and TIMEPLACED columns from df_processing and you have your output DataFrame.

Step 4. Write the resulting DataFrame to an excel file:
Write your output DataFrame to an excel file using the to_excel function:

df_processing.to_excel("target_excel.xlsx")

to_excel docs

Step 5. Append the original DataFrame:

Finally to append to the original excel, you can look into the following question: How to write to an existing excel file without overwriting data (using pandas)?

melvio
  • 762
  • 6
  • 24