1

I have the following dataset:

structure(list(First.Name = c(5006L, 5006L, 5006L, 5006L, 5006L, 
5006L, 5006L, 5006L, 5006L, 5006L, 5006L, 5006L, 5006L, 5006L, 
5006L, 5006L, 5007L, 5007L, 5007L, 5007L, 5007L), Week_Year = structure(c(21L, 
21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 23L, 23L, 
23L, 23L, 17L, 17L, 18L, 18L, 18L), .Label = c("2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
"49"), class = "factor")), row.names = c(NA, -21L), groups = structure(list(
    First.Name = 5006:5007, .rows = structure(list(1:16, 17:21), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

which a subset looks like:

# A tibble: 21 x 2
# Groups:   First.Name [2]
First.Name Week_Year
<int> <fct> 
5006    23          
5006    23          
5006    24          
5006    24          
5006    24          
5006    24          
5007    18          
5007    18          
5007    19          
5007    19

I would like to mutate a third column that provides a count (per subject, there are hundreds in the real dataset) for each Week_Year repeating within weeks, not between. Each new subject will restart the Week_Count

Ideal output:

First.Name Week_Year Week_count
<int>  <int>  <int>
5006    23      6   
5006    23      6   
5006    24      7   
5006    24      7   
5006    24      7   
5006    24      7   
5007    18      1   
5007    18      1   
5007    19      2   
5007    19      2

I have been playing around with forms of the following but can only get it to work if I use the summarize function which isn't what I need.

df %>% 
  group_by(First.Name, Week_Year) %>% 
  mutate(Week_Count = row_number())
CanyonView
  • 401
  • 3
  • 15

1 Answers1

1

Since the values are sequential you can simply reindex based on the minimum value of Week_Year:

df %>% 
  dplyr::group_by(First.Name) %>% 
  dplyr::mutate(week_count = Week_Year - min(Week_Year) + 1) %>% 
  dplyr::ungroup()
LMc
  • 12,577
  • 3
  • 31
  • 43
  • Yes, thanks! Never heard of ```cur_group_id```, good to know – CanyonView Feb 09 '21 at 23:37
  • Actually, so when it moves to the next subject, instead of resetting the ``week_count`` to ``1``, it continues counting sequentially from the last number of the count for the previous subject. – CanyonView Feb 10 '21 at 00:20
  • Make sure your `group_by` your two grouping variables: `First.Name` and `Week_Year`. – LMc Feb 10 '21 at 01:09
  • That's what I had, just as you posted in the answer. It IDs ``First.Name`` 's Week_counts until week 7 for subject ``5006``, then starts subject ``5007`` at ``Week_Count`` ID 8, and so on. – CanyonView Feb 10 '21 at 01:13
  • You're right, the `cur_group_id` does not work with two grouping variables. Does the second option does not work? – LMc Feb 10 '21 at 01:55
  • It was not working with ``Week_Year`` as factor but converting to integer helped and now its accurate, thanks! – CanyonView Feb 10 '21 at 02:14