0

I have data in the following form in a csv file.

       **Item    Dept   Sale_Qty    Sale_Date   Start_Date  End_Date Sale_Fcst**   
       Item1    Dep1    237         3/30/2012   4/17/2012   4/17/2014   0
       Item1    Dep1    211         4/6/2012    4/17/2012   4/17/2014   0
       Item1    Dep1    155         4/13/2012   4/17/2012   4/17/2014   0
       Item1    Dep1    0           4/13/2012   4/17/2012   4/17/2014   125.5525055
       Item1    Dep1    145         4/20/2012   4/17/2012   4/17/2014   0
       Item1    Dep1    0           4/20/2012   4/17/2012   4/17/2014   107.5810699
       Item1    Dep1    179         4/27/2012   4/17/2012   4/17/2014   0
       Item1    Dep1    0           4/27/2012   4/17/2012   4/17/2014   89.33208466
       Item1    Dep1    0           5/4/2012    4/17/2012   4/17/2014   71.19781494
       Item1    Dep1    120         5/4/2012    4/17/2012   4/17/2014   0
       Item1    Dep1    0           5/11/2012   4/17/2012   4/17/2014   53.19638062
       Item1    Dep1    125         5/11/2012   4/17/2012   4/17/2014   0

What I want to retain is the following.

       **Item    Dept   Sale_Qty       Sale_Date   Start_Date  End_Date    Sale_Fcst**   
       Item1    Dep1    237         3/30/2012   4/17/2012   4/17/2014   0
       Item1    Dep1    211         4/6/2012    4/17/2012   4/17/2014   0
       Item1    Dep1    155         4/13/2012   4/17/2012   4/17/2014   125.5525055
       Item1    Dep1    145         4/20/2012   4/17/2012   4/17/2014   107.5810699
       Item1    Dep1    179         4/27/2012   4/17/2012   4/17/2014   89.33208466
       Item1    Dep1    120         5/4/2012    4/17/2012   4/17/2014   71.19781494
       Item1    Dep1    125         5/11/2012   4/17/2012   4/17/2014   53.19638062

So that the Sale_Date values do not repeat and the information in the columns Sale_Qty and Sale_Fcst get combined into the same row. Is there a way using excel command or say, Python/pandas to achieve the same?

I have spent days trying to do it using excel but failed. Trying manually is driving me crazy since the records run into thousands of rows! I am even trying to learn Python to achieve the same but failure stares at me there as well!

Any help would be much appreciated.

Thanks!

5122014009
  • 3,766
  • 6
  • 24
  • 34
  • 2
    Use `df.groupby(['Item','Dept','Sale_Date','Start_Date','End_Date'], sort=False, as_index=False).sum()` – jezrael Feb 21 '23 at 10:42

0 Answers0