1

I have a very long data frame (over 300 000 rows), consisting of all subjects trials of a task, in long format. So about 300 rows is successive trials for one subject, followed by the other subjects underneath. What I want to do is create a new column with trial numbers for every subject.

Such as:

subject trial_number 101 1 101 2 101 3 102 1 102 2 102 3

I am thinking I should somehow make R choose a subject number from column subject, and then create an ascending list and then loop this over all the subject numbers. But I have not been able to figure out how to loop over subject numbers, while also creating an ascending list in one and the same column over these subjects? The different was of creating new columns I have seen are all based on calculations or values in other columns, where for me the values in the new column is not based on a calculation of or value of another column.

I also thought of splitting the data frame into smaller ones based on subject number, create ascending lists and the merge them again? Seems like a very inefficient way to do it though?

I don't have example code for failed attempts or so, as I haven't been able to figure out how to structure this. I'm thinking some kind of combination of subset and within? Or are there better solutions my googling skills haven't helped me find yet?

www
  • 38,575
  • 12
  • 48
  • 84
  • Adapting the top answer at the dupe for you (base R): `df$trial_number <- ave(df$subject, df$subject, FUN = seq_along)`. Packages `dplyr` or `data.table` make doing things "by group" very simple, you may want to look into using one of them for this and other operations. – Gregor Thomas Feb 13 '19 at 16:06

1 Answers1

0

Use dplyr.

library(dplyr)

dat2 <- dat %>%
  group_by(subject) %>%
  mutate(trial_number = 1:n()) %>%
  ungroup()
dat2
#   subject trial_number
#     <int>        <int>
# 1     101            1
# 2     101            2
# 3     101            3
# 4     102            1
# 5     102            2
# 6     102            3

Or

dat2 <- dat %>%
  group_by(subject) %>%
  mutate(trial_number = row_number()) %>%
  ungroup()
dat2
#   subject trial_number
#     <int>        <int>
# 1     101            1
# 2     101            2
# 3     101            3
# 4     102            1
# 5     102            2
# 6     102            3

Or data.table

library(data.table)

setDT(dat)

dat[, trial_number := seq_len(.N), by = subject][]
   subject trial_number
1:     101            1
2:     101            2
3:     101            3
4:     102            1
5:     102            2
6:     102            3

Or rowid or rowidv in data.table.

library(data.table)

setDT(dat)

dat[, trail_number := rowidv(dat, cols = "subject")][]
#    subject trial_number
# 1:     101            1
# 2:     101            2
# 3:     101            3
# 4:     102            1
# 5:     102            2
# 6:     102            3

library(data.table)

setDT(dat)

dat[, trail_number := rowid(dat$subject)][]
#    subject trial_number
# 1:     101            1
# 2:     101            2
# 3:     101            3
# 4:     102            1
# 5:     102            2
# 6:     102            3

Or base R with tapply and unlist.

dat2 <- dat
dat2$trial_number <- unlist(tapply(dat$subject, dat$subject, seq_along))
dat2
#   subject trial_number
# 1     101            1
# 2     101            2
# 3     101            3
# 4     102            1
# 5     102            2
# 6     102            3

Data

dat <- read.table(text = "subject
    101
    101
    101
    102
    102
    102 ", header = TRUE)
www
  • 38,575
  • 12
  • 48
  • 84
  • 1
    I used dplyr and did like in your first example, worked perfectly. Wish I would have known about the mutate function much earlier in life, thank you! – Viola Hollestein Feb 14 '19 at 08:47