1

I currently have a dataframe that looks like the below where I need to reset the cumsum everytime it crosses a multiple of 1000 ex (2000,3000...etc) and

                    Production    ID  cumsum  
     2017-10-19        1054  1323217    1054     
     2017-10-20           0  1323217    1054     
     2017-10-21           0  1323217    1054     
     2017-10-22           0  1323217    1054     
     2017-10-23           0  1323217    1054  

for example in the above, I need a df that looks like the below:

                 Production    ID      cumsum  adjCumsum numberGenerated
      2017-10-19        1054  1323217    1054     1000      1
      2017-10-20           0  1323217    1054     54        0
      2017-10-21           0  1323217    1054     54        0
      2017-10-22        3054  1323217    4108     4000      4
      2017-10-23           0  1323217    4018     108       0 
      2017-10-23         500  1323218    500      500       0

The below, correctly resets the value every 1000 but I can't quite seem to figure out how to translate this over grouping it by ID and rounding it to the 1000s.

maxvalue = 1000

lastvalue = 0
newcum = []
    for row in df.iterrows():
        thisvalue =  row[1]['cumsum'] + lastvalue
           if thisvalue > maxvalue:
              thisvalue = 0
           newcum.append( thisvalue )
           lastvalue = thisvalue
  df['newcum'] = newcum

thanks to the answer below I'm now able to calculate cumulative number generated, but I need to calc incremental # generated.

     df['cumsum'] = df.groupby('ID')['Production'].cumsum()
     thresh = 1000
     multiple = (df['cumsum'] // thresh )
     mask = multiple.diff().ne(0)
     df['numberGenerated'] = np.where(mask, multiple, 0)
     df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % 
     thresh

    df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()

My initial thinking was to try something similar to:

      df['numGen1'] = df['cumsum2'].diff()

Final Edit Tested and working. Thanks for the help getting here

I was overthinking it, below is how I was able to do it:

df['cumsum'] = df.groupby('ID')['Production'].cumsum()
thresh = 1000

multiple = (df['cumsum'] // thresh )

mask = multiple.diff().ne(0)
df['numberGenerated'] = np.where(mask, multiple, 0)
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh

df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()

numgen = []
adjcumsum = []

for i in range(len(df['cumsum'])):
    if df['cumsum'][i] > thresh and (df['ID'][i] == df['ID'][i-1]):
        numgenv = (df['cumsum'][i] // thresh) - (df['cumsum'][i-1] // thresh)
        numgen.append(numgenv)
    elif df['cumsum'][i] > thresh:
        numgenv = (df['cumsum'][i] // thresh)
        numgen.append(numgenv)
    else:
        numgenv = 0
        numgen.append(numgenv)

df['numgen2.0'] = numgen
user2679225
  • 159
  • 2
  • 4
  • 12
  • related : https://stackoverflow.com/questions/56904390/restart-cumsum-and-get-index-if-cumsum-more-than-value#56904650 – anky Jul 19 '19 at 16:58

1 Answers1

2

IIUC, this simply is an integer division problem with some tricks:

thresh = 1000
df['cumsum'] = df['Production'].cumsum()

# how many times cumsum passes thresh
multiple = (df['cumsum'] // thresh )

# detect where thresh is pass
mask = multiple.diff().ne(0)

# update the number generated:
df['numberGenerated'] = np.where(mask, multiple, 0)

# then the adjusted cumsum 
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh

Output:

            Production       ID  cumsum  adjCumsum  numberGenerated
2017-10-19        1054  1323217    1054       1054                1
2017-10-20           0  1323217    1054         54                0
2017-10-21           0  1323217    1054         54                0
2017-10-22        3054  1323217    4108       4108                4
2017-10-23           0  1323217    4108        108                0
2017-10-23         500  1323218    4608        608                0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Actually - i may have miss represented the problem. I need to calc the numberGenerated off the adjCumSum. In my example I wrote adjCumsum = 4108 it shoud have been 3054 + 54 = 3108 and number generated = 3. – user2679225 Jul 19 '19 at 17:50
  • It might be even easier, i.e, your `numberGenerated` is just `multiple.diff()`. – Quang Hoang Jul 19 '19 at 17:52
  • Can you exmplain? this is the first time running into this kind of problem. – user2679225 Jul 19 '19 at 18:21
  • `multiple.diff()` gives you the difference between an item and its previous item. Here, `multiple` is more or less `numberSoFar` (similar to `cumsum` but round down to thousands. So if you are looking for the incremental `numberGenerated`, isn't it exactly how much `multiple` changes? – Quang Hoang Jul 19 '19 at 18:24
  • I'm still struggling to understand this. where would you recommend I go for documentation? New to python so still a big beyond my level. – user2679225 Jul 20 '19 at 00:44