How to assign sequential values to a variable in R while defining sequence by number of values contained in a different variable

Question

I have patient data where a patient was given the same assessment at different time points. I want to number those assessments sequentially by date.

Here's my input:

12 x 3 df with cols: pt_id, assess_date, assess_id

Here's my desired output:

12 x 5 df with cols: pt_id, assess_date, assess_id, num_assess, assess_num

Here's what I've tried:

data <- data %>% 
           group_by(pt_id) %>%
           mutate(num_assess <- n_distinct(assess_date))

data$assess_num <- NA

data <- data %>% 
           group_by(pt_id) %>% 
           for(i in 1:num_assess) {
              assess_num <- i
            }

I also tried using n_distinct to define the sequence without creating the assess_num variable, but that didn't work either

Here's the error that I get:

Error in for (. in i) 1:num_assess : 4 arguments passed to 'for' which requires 3

Thoughts? TIA!

Hey tws061105, thanks for posting what you have attempted. It is also a good habit to post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). On that note, is assess_date a date or a string? If it is, you can extract the month with something like: `as.numeric(format(x, "%m"))` (assuming you want it to be numeric). — Andrew, Mar 02 '19 at 01:27
Hey Andrew - thanks for that suggestion! That definitely makes sense! I'll keep that in mind for future posts! — tws061105, Mar 02 '19 at 21:59

desc · Answer 1 · 2019-03-07T21:48:31.393

Here is a simplified version using your dates (as factors) to simply extract the level of each variable:

data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
                  4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
                  2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
                  "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
                  64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), class = "data.frame", row.names = c(NA, 
                  -12L))

data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(assess_date))

If they aren't factors (yet), then:

data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

The output looks like:

# A tibble: 12 x 4
# Groups:   pt_id [3]
   pt_id assess_date assess_id assess_num
   <int> <fct>           <int>      <int>
 1  1234 1/1/2019           64          1
 2  1234 1/2/2019           64          2
 3  1234 1/3/2019           64          3
 4  1234 1/4/2019           64          4
 5  4567 1/1/2019           64          1
 6  4567 1/2/2019           64          2
 7  4567 1/3/2019           64          3
 8  4567 1/4/2019           64          4
 9  8900 1/1/2019           64          1
10  8900 1/2/2019           64          2
11  8900 1/3/2019           64          3
12  8900 1/4/2019           64          4

EDIT: Here is a more explicit set of potential solutions depending on what the original access_date column class is:

library(tidyr)
library(dplyr)

# data.example as tibble:
data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
  4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
  2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
  "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
  64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), row.names = c(NA, 
  -12L), class = c("tbl_df", "tbl", "data.frame"))

# if assess_date is the string class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

# if assess_date is the factor class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(as.Date(assess_date,"%m/%d/%Y"))))

# if assess_date is the Date class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

@tws061105, the `L` denotes that the value will be [an integer](https://stackoverflow.com/a/24350749/3965651). You can create the reproducible data by taking all or part of your example data and using `dput` (e.g. `dput(mtcars)`). — desc, Mar 04 '19 at 22:02
Actually, this doesn't quite work. It worked when I just ran the proposed solution, but when I apply it to my actual data, it doesn't quite work. I see that this proposed solution establishes the "level" of the assessment date as assess_date = structure(c(1L,2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), and then labels those with the dates for those assessments, but in my actual data, I can't easily convert the dates to an integer, and converting the whole column to a factor doesn't quite work either, because patients can be assessed on different days (there's no one set of assessment dates) — tws061105, Mar 07 '19 at 21:06
Thanks for the help, and with your patience with me posting a simplified version of my data that is less-than-ideal and not reproducible :/ — tws061105, Mar 07 '19 at 21:07
@tws061105, check out the edits to see if it solves your issue. You shouldn't need to be doing any mapping from `Date` to `integer`, if the `access_date` column class is a `factor`, or you convert it to one as described above, that will map the dates for you — desc, Mar 07 '19 at 21:50

score 1 · Answer 2 · answered Mar 02 '19 at 02:41

Clever solution from @desc. If your date is formatted as a date, and you want it to be numeric the below script works. This uses the data.example from desc (thank you), but the date format is d/m/y which is why format in as.Date is "%d/%m/%Y".

> data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
+                                         4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
+                                                                                                              2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
+                                                                                                                                                                      "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
+                                                                                                                                                                                                                                            64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), class = "data.frame", row.names = c(NA, 
+                                                                                                                                                                                                                                                                                                                                         -12L))
> 
> data.example$assess_date <- as.Date(data.example$assess_date, format = "%d/%m/%Y")
> data.example$assess_num <- as.numeric(format(data.example$assess_date, "%m"))
> data.example
   pt_id assess_date assess_id assess_num
1   1234  2019-01-01        64          1
2   1234  2019-02-01        64          2
3   1234  2019-03-01        64          3
4   1234  2019-04-01        64          4
5   4567  2019-01-01        64          1
6   4567  2019-02-01        64          2
7   4567  2019-03-01        64          3
8   4567  2019-04-01        64          4
9   8900  2019-01-01        64          1
10  8900  2019-02-01        64          2
11  8900  2019-03-01        64          3
12  8900  2019-04-01        64          4

Thanks @Andrew! This looks like it is reliant upon the assessments occurring in different months, which isn't always the case for me (but which I recognize is consistent with the example that I provided). Thanks for weighing in, though, and thanks for the critique re: posting reproducible examples — tws061105, Mar 04 '19 at 21:53
Sure thing, and I am glad you found a solution that works for your data! Also, it is customary to hit the check-mark next to the answer which solves your problem if your issue is resolved (i.e., to accept desc's answer). Thanks for following up too! — Andrew, Mar 05 '19 at 13:43

score 0 · Accepted Answer · answered May 15 '19 at 19:13

Many thanks for the suggestions. Unfortunately, I couldn't get any of the suggested solutions to work, but I did find exactly what I needed in the getanID function from the splitstackshape package, according to the following code:

getanID(data, "pt_id") - worked like a charm!

How to assign sequential values to a variable in R while defining sequence by number of values contained in a different variable

3 Answers3