R: how to create ID number for a variable for each subject

Question

My data looks like this:

Subject Time Duration 
1       0.1s   0.006
1       0.5s   0
1       1s     0.733
..
2       0.3s   0
2       0.5s   0.553
2       2s     0.344
..

I want to order variable Duration according to variable Time for each Subject, excluding the 0 values.

I want to create an additional ordinal number ID variable, which will be ordered according to Time. So that my data looks like this:

Subject Time Duration ID
1       0.1s   0.006  1
1       0.5s   0      NA
1       1s     0.733  2
..
2       0.3s   0      NA
2       0.5s   0.553  1
2       2s     0.344  2
..

I tried dplyr::mutate(dataframe1, ID = row_number()) but I don't know how to mutate for every subject and how to exclude 0 or NA values in Duration variable. Could anyone help me solve the problem? Thank you.

akrun · Answer 1 · 2020-03-09T20:19:16.747

We can use rowid from data.table

library(data.table)
df1$ID <- rowid(df1$Subject)

If it needs to be ordered by 'Time'

library(data.table)
setDT(df1)[order(Subject, as.numeric(sub("[^0-9.]+", "", Time))), 
           ID := rowid(Subject)]
df1
#   Subject Time Duration ID
#1:       1 0.1s    0,006  1
#2:       1 0.5s    0,663  2
#3:       1   1s    0,733  3
#4:       2 0.3s    0,002  1
#5:       2 0.5s    0,553  2
#6:       2   2s    0,344  3

Or using base R

df1$ID <- with(df1, ave(seq_along(Subject), Subject, FUN = seq_along))

Or in tidyverse

library(dplyr) 
df1 %>%
    group_by(Subject) %>%
    mutate(ID = row_number(readr::parse_number(Time)))
# A tibble: 6 x 4
# Groups:   Subject [2]
#  Subject Time  Duration    ID
#    <int> <chr> <chr>    <int>
#1       1 0.1s  0,006        1
#2       1 0.5s  0,663        2
#3       1 1s    0,733        3
#4       2 0.3s  0,002        1
#5       2 0.5s  0,553        2
#6       2 2s    0,344        3

Update

For the updated question

df2 %>%
    group_by(Subject) %>%
    mutate(ID = replace(rep(NA_integer_, n()), Duration > 0, 
         row_number(readr::parse_number(Time[Duration > 0]))))
# A tibble: 6 x 4
# Groups:   Subject [2]
#  Subject Time  Duration    ID
#    <int> <chr>    <dbl> <int>
#1       1 0.1s     0.006     1
#2       1 0.5s     0        NA
#3       1 1s       0.733     2
#4       2 0.3s     0        NA
#5       2 0.5s     0.553     1
#6       2 2s       0.344     2

data

df1 <- structure(list(Subject = c(1L, 1L, 1L, 2L, 2L, 2L), Time = c("0.1s", 
"0.5s", "1s", "0.3s", "0.5s", "2s"), Duration = c("0,006", "0,663", 
"0,733", "0,002", "0,553", "0,344")), class = "data.frame", row.names = c(NA, 
-6L))




df2 <-structure(list(Subject = c(1L, 1L, 1L, 2L, 2L, 2L), Time = c("0.1s", 
"0.5s", "1s", "0.3s", "0.5s", "2s"), Duration = c(0.006, 0, 0.733, 
0, 0.553, 0.344)), class = "data.frame", row.names = c(NA, -6L
))

For the Tidyverse answer, do not forget to arrange the times before assigning the ID. — Justace Clutter, Mar 09 '20 at 18:49
Thank you, but none of them work for me, I want to sort Duration variable according to Time variable for each subject. — Tk Shmk, Mar 09 '20 at 19:18
@TkShmk Based on the example you sshowed, I get the expected output though — akrun, Mar 09 '20 at 19:19
@TkShmk please note that your 'Duration', 'Time' are not numeric columns i.e. `,` is not considered numeric — akrun, Mar 09 '20 at 19:21
I corrected my question. Can I somehow specify that I only want to sort Duration variable values that are >0? — Tk Shmk, Mar 09 '20 at 19:50

R: how to create ID number for a variable for each subject

1 Answers1

Update

data