2

This is a very simple problem, but I have the following data:

> head(Session_numbers)
   ID Session 
1  1 1_43392 
2  1 1_43392 
3  1 1_43392 
4  1 1_43394 
5  1 1_43394 
6  1 1_43394 
7  1 1_43398 
8  1 1_43401 
9  2 2_44502 
10 2 2_44502 

where ID is the grouping label per subject, and every row has a session code, which corresponds to points in time. I want to number the session codes sequentially in a variable 'Snum' so that each identical session code per ID gets grouped and is given the same number, such as:

   ID Session Snum
1  1 1_43392 1
2  1 1_43392 1
3  1 1_43392 1
4  1 1_43394 2
5  1 1_43394 2
6  1 1_43394 2
7  1 1_43398 3
8  1 1_43401 4
9  2 2_44502 1
10 2 2_44502 1

The number of Sessions per ID differs, and every Session code is unique.

I have tried to use ave, dplyr and data.table but I just can't seem to get it right, e.g.:

 DT <- data.table(Session_numbers)
 DT[, Snum := seq_len(.N), by = list(ID, Session)]

 > head(DT)
    ID Session Snum
 1:  1 1_43392    1
 2:  1 1_43392    2
 3:  1 1_43392    3
 4:  1 1_43394    1
 5:  1 1_43394    2
 6:  1 1_43394    3

Or using dplyr, with the following code which gives me an error message:

> Session_numbers %>%
 +   group_by(ID, Session) %>%
 +   mutate(Snum = row_number())
 Error: row_number() should only be called in a data context
 Call `rlang::last_error()` to see a backtrace

or with ave

head(Session_numbers)
  ID Session num
1  1 1_43392   1
2  1 1_43392   2
3  1 1_43392   3
4  1 1_43394   1
5  1 1_43394   2
6  1 1_43394   3

My question is similar to this one: Count number of observations/rows per group and add result to data frame

What am I missing?

Clarius333
  • 93
  • 9
  • What is the `packageVersion('dplyr')` The error you showed is not reproducible to me – akrun Nov 06 '19 at 18:04
  • 1
    My `dplyr` package was `packageVersion('dplyr') [1] ‘0.8.3’`, after installing `rlang`, I don't get this error anymore – Clarius333 Nov 06 '19 at 18:14
  • 1
    While installing `dplyr`, it should have automatically iinstalled, don't know why `rlang` didn't get installed. Anyway, all is well now – akrun Nov 06 '19 at 18:15
  • [How to assign a unique ID number to each group of identical values in a column](https://stackoverflow.com/questions/24119599/how-to-assign-a-unique-id-number-to-each-group-of-identical-values-in-a-column); [Add ID column by group](https://stackoverflow.com/questions/13566562/add-id-column-by-group); [How to create a consecutive index based on a grouping variable in a dataframe](https://stackoverflow.com/questions/6112803/how-to-create-a-consecutive-index-based-on-a-grouping-variable-in-a-dataframe). – Henrik Nov 06 '19 at 18:40

2 Answers2

2

Here is an option where we group by 'ID' and match the 'Session' with unique elements of 'Session' to get the index

library(dplyr) 
Session_numbers %>%
  group_by(ID) %>%
  mutate(num = match(Session, unique(Session)))
# A tibble: 10 x 3
# Groups:   ID [2]
#      ID Session   num
#   <int> <chr>   <int>
# 1     1 1_43392     1
# 2     1 1_43392     1
# 3     1 1_43392     1
# 4     1 1_43394     2
# 5     1 1_43394     2
# 6     1 1_43394     2
# 7     1 1_43398     3
# 8     1 1_43401     4
# 9     2 2_44502     1
#10     2 2_44502     1

Or using base R

Session_numbers$num <- with(Session_numbers, ave(Session, ID, FUN = 
              function(x) match(x, unique(x))))

If we are interested in changing the non-adjacent elements to new number, then

library(data.table)
...
 %>% mutate(num = rleid(Session))

data

Session_numbers <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), 
    Session = c("1_43392", "1_43392", "1_43392", "1_43394", "1_43394", 
    "1_43394", "1_43398", "1_43401", "2_44502", "2_44502")), 
    class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you for your quick reply! I tried to replicate this, but what I am now seeing is that each numbering keeps adding, as in output below. This is using the exact dplyr code and generating new data. I don't know why? # A tibble: 10 x 3 # Groups: ID [2] ID Session num * 1 1 1_43392 1 2 1 1_43392 1 3 1 1_43392 1 4 1 1_43394 2 5 1 1_43394 2 6 1 1_43394 2 7 1 1_43398 3 8 1 1_43401 4 9 2 2_44502 5 10 2 2_44502 5 – Clarius333 Nov 06 '19 at 18:02
  • @Clarius333 not clear from the comments. I copied your data and made a reproducible example – akrun Nov 06 '19 at 18:03
  • 1
    @Clarius333 I used `packageVersion('dplyr')# [1] ‘0.8.3’ > packageVersion('rlang') [1] ‘0.4.0’` – akrun Nov 06 '19 at 18:08
  • 1
    I was missing package 'rlang' ... thank you so much! – Clarius333 Nov 06 '19 at 18:11
1

Another dplyr option could be:

df %>%
 group_by(ID) %>%
 mutate(Snum = cumsum(!duplicated(Session)))

      ID Session  Snum
   <int> <chr>   <int>
 1     1 1_43392     1
 2     1 1_43392     1
 3     1 1_43392     1
 4     1 1_43394     2
 5     1 1_43394     2
 6     1 1_43394     2
 7     1 1_43398     3
 8     1 1_43401     4
 9     2 2_44502     1
10     2 2_44502     1
tmfmnk
  • 38,881
  • 4
  • 47
  • 67