R Script- Cumsum() reseting when there is a new customer id

Question

R Script Question:

I have a data set titled "earnings101.csv" which has the following columns: {sampleid, quarter, quarterlyearn, & grade}. For every sampleid there are 4 quarters. I need help creating a new column titled "experience" which keeps a running sum of number of years of experience. Someone earns a year of experience if quartlyearn > 0. Screenshot of dataset below:

Screenshot of Sample Data Here

Please provide reproducible and copy&paste-able sample data. A screenshot is never a good idea as we can't extract data from an image. It sounds like you want to use grouping, which you can do in various ways (e.g. using `dplyr::group_by` or `ave` in base R). — Maurits Evers, Feb 25 '20 at 23:50

Maurits Evers · Answer 1 · 2020-02-26T00:26:33.207

First off, for future posts please provide reproducible and copy&paste-able sample data; we cannot extract data from screenshots. For details see how to provide a minimal reproducible example/attempt.

As to your question, a base R approach uses ave

transform(df, experience = ave(quarterlyearn, sampleid, FUN = function(x) cumsum(x > 0)))

Or using dplyr::group_by

library(dplyr)
df %>% group_by(sampleid) %>% mutate(experience = cumsum(quarterlyearn > 0))

Or using data.table syntax

library(data.table)
setDT(df)
df[, experience := cumsum(quarterlyearn > 0), by = sampleid]

All give the same result

#  sampleid quarter quarterlyearn grade experience
#1     1214       1           161    10          1
#2     1214       2           523     9          2
#3     1214       3             0     8          2
#4     1214       4           919     9          3
#5     3441       1            42    12          1
#6     3441       2             0     4          1
#7     3441       3            14     7          2
#8     3441       4             0    12          2

Sample data

df <- data.frame(
sampleid = c(1214, 1214, 1214, 1214, 3441, 3441, 3441, 3441),
quarter = c(1:4, 1:4),
quarterlyearn = c(161, 523, 0, 919, 42, 0, 14, 0),
grade = c(10, 9, 8, 9, 12, 4, 7, 12))

score 0 · Accepted Answer · answered Feb 25 '20 at 23:58

library(tidyverse)

df <- 
  tibble(
    sampleid = c(1214, 1214, 1214, 3441, 3441, 3441),
    quarter = c(1, 2, 3, 1, 2, 3),
    earn = c(3, 0, 3, 0, 1, 1)
  )

df %>% 
  group_by(sampleid) %>% 
  mutate(experience = cumsum(earn > 0)) %>% 
  ungroup()

  sampleid quarter  earn experience
     <dbl>   <dbl> <dbl>      <int>
1     1214       1     3          1
2     1214       2     0          1
3     1214       3     3          2
4     3441       1     0          0
5     3441       2     1          1
6     3441       3     1          2

This works by grouping by the sampleid variable and then doing a cumulative sum of the logical outcome of earn > 0, where false turns into a 0 and true turns into a 1. You'll want to run the ungroup command after creating the new column.

R Script- Cumsum() reseting when there is a new customer id

2 Answers2

Sample data