2

Yesterday I have asked similar question Count each next occurence of string in substring now I'm struggling with another one:

apple.a > banana.b > banana.b > carrot-c > banana.b > apple.a > carrot-c > banana.b > apple.a

What I want to achieve, is to check consecutive occurences, so the result would be

apple.a1 > banana.b1 > banana.b2 > carrot-c1 > banana.b1 > apple.a1 > carrot-c1 > banana.b1 > apple.a1

I have already tried several solutions:

Count consecutive TRUE values within each block separately

Counting the number of occurrences of a value in R

R: count consecutive occurrences of values in a single column

to list few of them, but none seem to have worked for me and I couldn't achieve desired results. I tried to combine strsplit with unlist, sequence, rle and several other functions and wasn't able to overcome my problem.

To clear things up: data frame has several columns and sequence of words is stored in one of them.

Henrik
  • 65,555
  • 14
  • 143
  • 159
Marcin
  • 137
  • 1
  • 10
  • 3
    `lapply(strsplit(s, " > "), function(x) paste0(x, data.table::rowid(rleid(x)), collapse = " > "))` – Henrik Nov 14 '18 at 10:36
  • Yep, that's exactly what I have been looking for, thank you! – Marcin Nov 14 '18 at 10:41
  • 1
    I realized I used the `rowid(rleid(x))` in the post you looked at: [Count consecutive TRUE values within each block separately](https://stackoverflow.com/a/48552636/1851712) ;) – Henrik Nov 14 '18 at 11:16
  • Looks like you are... consistent in usage of functions ;-) – Marcin Nov 14 '18 at 11:28
  • I am sorry, I deleted my previous comment as it was pointing at how dumb I am, got myself another coffee and sorted it out. It seems like a long way for me until I learn how to use R properly. toString both in my "main" code and sample snippets messed things up, I thought it just casts each value to string, my bad, thank you for clarification though. Now it works really awesome and is going to provide me some really useful insights to my work, thank you! – Marcin Nov 14 '18 at 17:01
  • No problem Marcin. You should not underestimate the wonders of an extra cup of coffee. Glad to hear that it work the way you wanted. – Henrik Nov 14 '18 at 17:03
  • 1
    No, it works great and got me really ahead with my work. Thank you for your time and patience, your help was priceless for me, I mean it. – Marcin Nov 14 '18 at 17:05

1 Answers1

1

To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid and rowid from the data.table package are used.

Toy data with two strings of different length:

s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")

library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
# 
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"
Henrik
  • 65,555
  • 14
  • 143
  • 159