0

I have a long data frame with multiple visits per subject based on visits to the hospital.

ID VISIT_DATE COPD DIABETES
1 2020-01-01 1 0
2 1965-01-01 0 0
3 1989-01-01 0 0
1 2020-02-10 1 1
2 1970-01-01 0 1
3 1995-01-01 1 1

I wanted to make a new variable called "VISIT_NUMBER" in which I can consecutively number which visit it is for each subject.

ID VISIT_DATE COPD DIABETES VISIT_NUMBER
1 2020-01-01 1 0 1
2 1965-01-01 0 0 1
3 1989-01-01 0 0 1
1 2020-02-10 1 1 2
2 1970-01-01 0 1 2
3 1995-01-01 1 1 2

I have used dplyr in the past for something like this but I am stumped on where to go next.

Ava Wilson
  • 33
  • 5

2 Answers2

3

base R

dat$VISIT_NUMBER <- ave(dat$ID, dat$ID, FUN=seq_along)
dat
#   ID VISIT_DATE COPD DIABETES VISIT_NUMBER
# 1  1 2020-01-01    1        0            1
# 2  2 1965-01-01    0        0            1
# 3  3 1989-01-01    0        0            1
# 4  1 2020-02-10    1        1            2
# 5  2 1970-01-01    0        1            2
# 6  3 1995-01-01    1        1            2

dplyr

library(dplyr)
dat %>%
  group_by(ID) %>%
  mutate(VISIT_NUMBER = row_number()) %>%
  ungroup()

data.table

library(data.table)
setDT(dat)
dat[, VISIT_NUMBER := seq_len(.N), by = .(ID)]

Data

dat <- structure(list(ID = c(1L, 2L, 3L, 1L, 2L, 3L), VISIT_DATE = c("2020-01-01", "1965-01-01", "1989-01-01", "2020-02-10", "1970-01-01", "1995-01-01"), COPD = c(1L, 0L, 0L, 1L, 0L, 1L), DIABETES = c(0L, 0L, 0L, 1L, 1L, 1L), VISIT_NUMBER = c(1L, 1L, 1L, 2L, 2L, 2L)), row.names = c(NA, -6L), class = "data.frame")
r2evans
  • 141,215
  • 6
  • 77
  • 149
2

Another data.table option with rowid

> setDT(df)[, VISIT_NUMBER := rowid(ID)][]
   ID VISIT_DATE COPD DIABETES VISIT_NUMBER
1:  1 2020-01-01    1        0            1
2:  2 1965-01-01    0        0            1
3:  3 1989-01-01    0        0            1
4:  1 2020-02-10    1        1            2
5:  2 1970-01-01    0        1            2
6:  3 1995-01-01    1        1            2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • 1
    I'm still adopting `data.table`'s many helper funcs like this one. `rleid` is a favorite of mine :-) – r2evans Feb 26 '21 at 21:53