R function to identify unique rows from previous rows, not within the dataframe altogether

Question

I have a dataframe in which I need to identify or index the start of each new trial. A new trial is indicated by variable Location from 0-8. Example below:

    zPos        Location
    1.9148150   6
    1.914815    6
    1.914815    6
    1.914815    6
    1.914815    6
    0.9018518   3
    0.9018518   3
    0.9009259   3
    0.9009259   3
    0.9009259   3
    0.9009259   3

There are 72 trials in each dataframe, so each location value repeats 8 times meaning unique won't work. I am a novice when it comes to R, so I haven't tried much outside of base R and dplyr to tackle this problem.

Ideally I would like to create a new variable for trial number, example below:

    zPos        Location       TrialNum
    1.9148150   6              1
    1.914815    6              1
    1.914815    6              1
    1.914815    6              1
    1.914815    6              1
    0.9018518   3              2
    0.9018518   3              2
    0.9009259   3              2
    0.9009259   3              2
    0.9009259   3              2
    0.9009259   3              2

But I could also work with an index of the starting location for each new trial rather than a new variable in the dataframe.

This is my first question on stackoverflow, so I greatly appreciate any assistance or insight.

Linking a possible duplicate which can give you alternatives [How to create a consecutive index based on a grouping variable in a dataframe](https://stackoverflow.com/questions/6112803/how-to-create-a-consecutive-index-based-on-a-grouping-variable-in-a-dataframe) — deepseefan, Sep 05 '19 at 13:10

score 2 · Accepted Answer · answered Sep 05 '19 at 12:55

You could use rle to do this.

df <- data.frame(
  zPos = c(1.9148150, 1.914815, 1.914815, 1.914815, 1.914815, 0.9018518,
           0.9018518, 0.9009259, 0.9009259, 0.9009259, 0.9009259),
  Location = c(6, 6, 6, 6, 6, 3, 3, 3, 3, 3, 3)
)

get_trial <- function(col) {
  r <- rle(col)
  rep(seq(length(r$lengths)), r$lengths)
}

df %>%
  mutate(TrialNum = get_trial(Location))

        zPos Location TrialNum
1  1.9148150        6        1
2  1.9148150        6        1
3  1.9148150        6        1
4  1.9148150        6        1
5  1.9148150        6        1
6  0.9018518        3        2
7  0.9018518        3        2
8  0.9009259        3        2
9  0.9009259        3        2
10 0.9009259        3        2
11 0.9009259        3        2

There is also `data.table::rleid`, which will give the same output as `get_trial` here. — IceCreamToucan, Sep 05 '19 at 13:49

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

This can work:

df$iTrialNum <- match(df$Location, unique(df$Location)) 

# -------------------------------------------------------------------------
#     zPos Location iTrialNum
# 1  1.9148150        6         1
# 2  1.9148150        6         1
# 3  1.9148150        6         1
# 4  1.9148150        6         1
# 5  1.9148150        6         1
# 6  0.9018518        3         2
# 7  0.9018518        3         2
# 8  0.9009259        3         2
# 9  0.9009259        3         2
# 10 0.9009259        3         2
# 11 0.9009259        3         2

Sample data (df)

dput(df)
structure(list(zPos = c(1.914815, 1.914815, 1.914815, 1.914815, 
1.914815, 0.9018518, 0.9018518, 0.9009259, 0.9009259, 0.9009259, 
0.9009259), Location = c(6L, 6L, 6L, 6L, 6L, 3L, 3L, 3L, 3L, 
3L, 3L)), class = "data.frame", row.names = c(NA, -11L))

This will work only when the location number does not repeat (i.e. what OP specifically wants to avoid) — erocoar, Sep 05 '19 at 13:18
@erocoar, I believe the solution by @Jaap (in the link I shared above) can address this problem. `x <- rle(df$Location)$lengths` followed by `df$TrialNum<- rep(seq_along(x), times=x)`. — deepseefan, Sep 05 '19 at 14:13

R function to identify unique rows from previous rows, not within the dataframe altogether

2 Answers2

Sample data (df)