2

Stuck on a computation in R. Suppose I have the following dataframe:

Name | Date | Count
Bob | 2019-03-03 | 253
Bob | 2019-03-03 | 253
Bob | 2019-03-02 | 252
Bob | 2019-03-01 | 251
Tim | 2019-03-04 | 257
Tim | 2019-03-04 | 257
Tim | 2019-03-04 | 256
Tim | 2019-03-03 | 254

My goal is to engineer a column of absolute change, like so:

Name | Date | Count | Change
Bob | 2019-03-03 | 253 | 0
Bob | 2019-03-03 | 253 | 1
Bob | 2019-03-02 | 252 | 1
Bob | 2019-03-01 | 251 | 0
Tim | 2019-03-04 | 257 | 0
Tim | 2019-03-04 | 257 | 1
Tim | 2019-03-04 | 256 | 2
Tim | 2019-03-03 | 254 | 0

I can obviously

df %>% group_by(Name) %>% arrange(desc(Date)) %>% arrange(desc(Count))

but after that, I'm lost. Do I mutate(Change = Count) somehow?

Christopher Penn
  • 539
  • 4
  • 14
  • By absolute change, I mean the numerical difference from row to row (as opposed to percentage change). Change would be the fourth column. – Christopher Penn Apr 03 '19 at 23:20

3 Answers3

4

A solution using diff from base R and dplyr.

library(dplyr)
library(tidyr)

df2 <- df %>%
  group_by(Name) %>%
  mutate(Change = c(-diff(Count), 0)) %>%
  ungroup()
df2
# # A tibble: 8 x 4
#   Name  Date       Count Change
#   <chr> <chr>      <int>  <dbl>
# 1 Bob   2019-03-03   253      0
# 2 Bob   2019-03-03   253      1
# 3 Bob   2019-03-02   252      1
# 4 Bob   2019-03-01   251      0
# 5 Tim   2019-03-04   257      0
# 6 Tim   2019-03-04   257      1
# 7 Tim   2019-03-04   256      2
# 8 Tim   2019-03-03   254      0

DATA

df <- read.table(text = "Name|Date|Count
Bob|'2019-03-03'|253
Bob|'2019-03-03'|253
Bob|'2019-03-02'|252
Bob|'2019-03-01'|251
Tim|'2019-03-04'|257
Tim|'2019-03-04'|257
Tim|'2019-03-04'|256
Tim|'2019-03-03'|254",
                 header = TRUE, stringsAsFactors = FALSE, sep = "|")
www
  • 38,575
  • 12
  • 48
  • 84
3

We can group_by Name and use lead from dplyr to subtract values in the current row by next row.

library(dplyr)

df %>%
  group_by(Name) %>%
  mutate(Change = Count - lead(Count, default = last(Count)))

#  Name  Date       Count Change
#  <chr> <chr>      <dbl>  <dbl>
#1 Bob   2019-03-03   253      0
#2 Bob   2019-03-03   253      1
#3 Bob   2019-03-02   252      1
#4 Bob   2019-03-01   251      0
#5 Tim   2019-03-04   257      0
#6 Tim   2019-03-04   257      1
#7 Tim   2019-03-04   256      2
#8 Tim   2019-03-03   254      0

A base R approach using ave

with(df, ave(Count, Name, FUN = function(x) c(x[-length(x)] - x[-1], 0)))
#[1] 0 1 1 0 0 1 2 0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Using data.table

library(data.table)
setDT(df)[,  Change := Count - shift(Count, fill = last(Count), 
                type = 'lead'), Name][]
#   Name       Date Count Change
#1:  Bob 2019-03-03   253      0
#2:  Bob 2019-03-03   253      1
#3:  Bob 2019-03-02   252      1
#4:  Bob 2019-03-01   251      0
#5:  Tim 2019-03-04   257      0
#6:  Tim 2019-03-04   257      1
#7:  Tim 2019-03-04   256      2
#8:  Tim 2019-03-03   254      0

NOTE: It is also a duplicate of this or this or this or this post

akrun
  • 874,273
  • 37
  • 540
  • 662