1

Is there a way to replace a character only if it is not repeating, or repeating a certain number of times?

str = c("ddaabb", "daabb", "aaddbb", "aadbb")
gsub("d{1}", "c", str)
[1] "ccaabb" "caabb"  "aaccbb" "aacbb" 

#Expected output
[1] "ddaabb" "caabb"  "aaddbb" "aacbb" 
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Maël
  • 45,206
  • 3
  • 29
  • 67

2 Answers2

3

You can use negative lookarounds in your regex to exclude cases where d is preceeded or followed by another d:

gsub("(?<!d)d(?!d)", "c", str, perl=TRUE)

Edit: adding perl=TRUE as suggested by OP. For more info about regex engine in R see this question

Tranbi
  • 11,407
  • 6
  • 16
  • 33
  • 2
    This throws an error on my console. Adding perl = TRUE fixed it – Maël Feb 07 '23 at 15:56
  • 1
    Thanks for the feedback! I'm not familiar with R so didn't know exactly what regex engine it uses... Added your comment to my answer! – Tranbi Feb 07 '23 at 15:59
  • If you're really into pain, consider `foo <- unlist(strsplit(str, ''); bar <- rle(foo)` and then look for instances of `bar$lengths ==1` :-) – Carl Witthoft Feb 07 '23 at 16:30
  • This regex works, but only specifically for single-instances of "d" . If @Maël wants to find **any** single char, then you'll need a rather more tricky regex whereas my hack in my previous comment will work. – Carl Witthoft Feb 07 '23 at 16:33
0

Now that you've added "or repeating a specified number of times," the regex-based approaches may get messy. Thus I submit my wacky code from a previous comment.

foo <- unlist(strsplit(str, '')
bar <- rle(foo) 

and then look for instances of bar$lengths == desired_length and use the returned indices to locate (by summing all bar$lengths[1:k] ) the position in the original sequence. If you only want to replace a specific character, check the corresponding value of bar$values[k] and selectively replace as desired.

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73