7

I'm trying to extract the number of times a given character is repeated and use it in the string to replace it for. Here's an example :

before = c("w","www","answer","test","wwwxww")
after = c("w{1}","w{3}","answ{1}er","test","w{3}xw{2}")

Is there a simple way, combining gsub and regex for instance, to achieve this ?

before = c("w","www","answer","test")
after = gsub("w+",w"\\{n\\}",before)

result :

after = c("w{n},"w{n}","answ{n}er","test")

the idea is to replace n with the exact number of occurrences

Maël
  • 45,206
  • 3
  • 29
  • 67
blofeld _
  • 75
  • 4
  • 1
    See related post to get counts for all letters: https://stackoverflow.com/questions/18969698/collapse-vector-to-string-of-characters-with-respective-numbers-of-consequtive-o – zx8754 Apr 25 '23 at 10:40

4 Answers4

5

Similar to @Sotos answer using stringr functions.

library(stringr)

str_replace_all(before, 'w+', function(x) str_c('w{', str_count(x, 'w'), '}'))
#[1] "w{1}"      "w{3}"      "answ{1}er" "test"      "w{3}xw{2}"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
4

An easy way is to use the package gsubfn(), i.e.

library(gsubfn)
before = c("w","www","answer","test","wwwxww")

gsubfn('w+', function(i)paste0('w{', nchar(i),'}'), before)
# [1] "w{1}"      "w{3}"      "answ{1}er" "test"      "w{3}xw{2}"
Sotos
  • 51,121
  • 6
  • 32
  • 66
4

Base R

word="w"

sapply(before,function(x){
  tmp=rle(unlist(strsplit(x,"")))
  paste0(
    ifelse(
      tmp$values==word,
      paste0(word,"{",tmp$lengths,"}"),
      strrep(tmp$values,tmp$lengths)
    ),
    collapse=""
  )
})

          w         www      answer        test      wwwxww 
     "w{1}"      "w{3}" "answ{1}er"      "test" "w{3}xw{2}"
user2974951
  • 9,535
  • 1
  • 17
  • 24
1

A base way using gregexpr to find the w and regmatches to substitute the matches with the match length.

x <- gregexpr("w+", before)
regmatches(before, x) <- lapply(x, \(y) paste0("w{", attr(y, "match.length"), "}"))
before
#[1] "w{1}"      "w{3}"      "answ{1}er" "test"      "w{3}xw{2}"
GKi
  • 37,245
  • 2
  • 26
  • 48