1

I have a data frame in python as shown in the screenshot. the logic for desire output is basically for each product ID, first year value[0]=max(cumulative profit[0],0), subsequent year it is max(0, cumulative profit[i]-sum of values in earlier years. this logic need to be applied to each product ID. can someone please help? I have tried a few ways like using double loop. no success so far. Many thanks. [dataframe][1]

df=pandas.dataframe({'Product ID':['A','A','A','A','B','B','C'],'Year':[1,2,3,4,1,2,1],'Cumulative Profit':[10,8,12,18,-5,10,2],'Desired output':[10,0,2,6,0,10,2]})

windvanish
  • 11
  • 2
  • Could you please include the data in your question as text, not as a picture? For instance `df = pandas.DataFrame({'Product ID': ['A', 'A', ...], 'Year': [...], ...})` – Stef Nov 23 '21 at 14:05
  • Would this similar question help you? [Calculate DataFrame values recursively](https://stackoverflow.com/questions/43710057/calculate-dataframe-values-recursively) – Stef Nov 23 '21 at 14:32

1 Answers1

0

There should be a clever way to do this using standard pandas methods.

Until I find such a way, here is a way to do it using loops. Please note that using explicit loops is usually slower than using pandas methods.

import pandas as pd

def aggreg_values(df):
  for _,g in df.groupby('id'):
    acc = 0
    for i,row in g.iterrows():
      val = max(0, row['v'] - acc)
      acc += val
      yield val


df = pd.DataFrame({'id': ['a', 'a', 'a', 'a', 'b', 'b'], 'y': [1,2,3,4,1,2], 'v': [10, 8, 12, 18, -5, 10]})

df['output'] = list(aggreg_values(df))

print(df)
#   id  y   v  output
# 0  a  1  10      10
# 1  a  2   8       0
# 2  a  3  12       2
# 3  a  4  18       6
# 4  b  1  -5       0
# 5  b  2  10      10
Stef
  • 13,242
  • 2
  • 17
  • 28
  • thank you very much. this works. could you explain what does -,g mean? – windvanish Nov 23 '21 at 16:45
  • When I applied to my actual data, I think one thing caused the logic to break is that my ID contains text with space and special sign. For example, Samsumg TV Model (TV-4571). When I replace these ID with A, B etc, then it worked well. is there any solution? – windvanish Nov 24 '21 at 03:53
  • After trial and error, I finally realized the issue I face with production data is because it need to be sorted in order to work. Maybe this is back to the _,g syntax in the loop. Appreciate @Step can provide some explanation. If some can find a pandas standard method instead of using loops, greatly appreciate if you can share. – windvanish Nov 24 '21 at 04:24
  • Yes, the dataframe should be sorted before applying this method. – Stef Nov 24 '21 at 08:28
  • `df.groupby('id')` is an iterator of pairs; for instance, you can try `for x,g in df.groupby('id'): print(x); print(g)` to display the groups. Here `x` will be the grouping key, and `g` will be the sub-dataframe. In the code in my answer, I don't care about `x`, so I called it `_` instead of `x`. This is standard python syntax to ignore a variable. – Stef Nov 24 '21 at 08:31