0

I have a dataframe in which I need to identify or index the start of each new trial. A new trial is indicated by variable Location from 0-8. Example below:

    zPos        Location
    1.9148150   6
    1.914815    6
    1.914815    6
    1.914815    6
    1.914815    6
    0.9018518   3
    0.9018518   3
    0.9009259   3
    0.9009259   3
    0.9009259   3
    0.9009259   3

There are 72 trials in each dataframe, so each location value repeats 8 times meaning unique won't work. I am a novice when it comes to R, so I haven't tried much outside of base R and dplyr to tackle this problem.

Ideally I would like to create a new variable for trial number, example below:

    zPos        Location       TrialNum
    1.9148150   6              1
    1.914815    6              1
    1.914815    6              1
    1.914815    6              1
    1.914815    6              1
    0.9018518   3              2
    0.9018518   3              2
    0.9009259   3              2
    0.9009259   3              2
    0.9009259   3              2
    0.9009259   3              2

But I could also work with an index of the starting location for each new trial rather than a new variable in the dataframe.

This is my first question on stackoverflow, so I greatly appreciate any assistance or insight.

hannahd36
  • 3
  • 1
  • Linking a possible duplicate which can give you alternatives [How to create a consecutive index based on a grouping variable in a dataframe](https://stackoverflow.com/questions/6112803/how-to-create-a-consecutive-index-based-on-a-grouping-variable-in-a-dataframe) – deepseefan Sep 05 '19 at 13:10

2 Answers2

2

You could use rle to do this.

df <- data.frame(
  zPos = c(1.9148150, 1.914815, 1.914815, 1.914815, 1.914815, 0.9018518,
           0.9018518, 0.9009259, 0.9009259, 0.9009259, 0.9009259),
  Location = c(6, 6, 6, 6, 6, 3, 3, 3, 3, 3, 3)
)

get_trial <- function(col) {
  r <- rle(col)
  rep(seq(length(r$lengths)), r$lengths)
}

df %>%
  mutate(TrialNum = get_trial(Location))

        zPos Location TrialNum
1  1.9148150        6        1
2  1.9148150        6        1
3  1.9148150        6        1
4  1.9148150        6        1
5  1.9148150        6        1
6  0.9018518        3        2
7  0.9018518        3        2
8  0.9009259        3        2
9  0.9009259        3        2
10 0.9009259        3        2
11 0.9009259        3        2
erocoar
  • 5,723
  • 3
  • 23
  • 45
0

This can work:

df$iTrialNum <- match(df$Location, unique(df$Location)) 

# -------------------------------------------------------------------------
#     zPos Location iTrialNum
# 1  1.9148150        6         1
# 2  1.9148150        6         1
# 3  1.9148150        6         1
# 4  1.9148150        6         1
# 5  1.9148150        6         1
# 6  0.9018518        3         2
# 7  0.9018518        3         2
# 8  0.9009259        3         2
# 9  0.9009259        3         2
# 10 0.9009259        3         2
# 11 0.9009259        3         2

Sample data (df)

dput(df)
structure(list(zPos = c(1.914815, 1.914815, 1.914815, 1.914815, 
1.914815, 0.9018518, 0.9018518, 0.9009259, 0.9009259, 0.9009259, 
0.9009259), Location = c(6L, 6L, 6L, 6L, 6L, 3L, 3L, 3L, 3L, 
3L, 3L)), class = "data.frame", row.names = c(NA, -11L))
Community
  • 1
  • 1
deepseefan
  • 3,701
  • 3
  • 18
  • 31
  • This will work only when the location number does not repeat (i.e. what OP specifically wants to avoid) – erocoar Sep 05 '19 at 13:18
  • @erocoar, I believe the solution by @Jaap (in the link I shared above) can address this problem. `x <- rle(df$Location)$lengths` followed by `df$TrialNum<- rep(seq_along(x), times=x)`. – deepseefan Sep 05 '19 at 14:13