1

Code

gsub('101', '111', '110101101')
#[1] "111101111"

Would anyone know why the second 0 in the input isn't being substituted into a 1 in the output? I'm looking for the pattern 101 in string and replace it with string 111. Later on I wish to turn longer sub-sequences into sequences of 1's, such as 10001 to 11111.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

5 Answers5

6

You could use a lookahead ?=

The way this works is q(?=u) matches a q that is followed by a u, without making the u part of the match.

Example:

gsub('10(?=1)', '11', '110101101', perl=TRUE);
// Output: 111111111

Edit: you need to use gsub in perl mode to use lookaheads

RedSparr0w
  • 467
  • 4
  • 11
  • 1
    This increases the length. You meant "gsub('10(?=1)', '11', '110101101')", right? – Clauzzz Dec 15 '19 at 20:28
  • Ah yes, nice catch! – RedSparr0w Dec 15 '19 at 20:31
  • 1
    Hey thanks for the answer! When I run the code I get Error in gsub("10(?=1)", "11", "110101101") : invalid regular expression '10(?=1)', reason 'Invalid regexp'. Any idea why this is happening? Cheers :D, edit: just saw ur edit, just needed to add perl = TRUE – mexicanseafood Dec 15 '19 at 20:47
5

Its because it doesnt work in a recursive way

gsub('101', '111', '110101101') divides the third string as it finds the matches. So it finds the first 101 and its left with 01101. Think about it. If it would replace "recursively", something like gsub('11', '111', '11'), would return an infinite string of '1' and break. It doesn't check in the already "replaced" text.

Clauzzz
  • 129
  • 5
2

It is because when R first detected 110101101, it treat the next 0 as in 011 in 110101101.

It seems that you only want to replace '0' by '1'. Then you can just use gsub('0', '1', '110101101')

yixi zhou
  • 298
  • 1
  • 10
1

Later on I wish to turn longer sub-sequences into sequences of 1's, such as 10001 to 11111.

Hopefully, R provides a means to generate the replacement string based on the matched substring. (This is a common feature.)

If so, search for 10+, and have the replacement string generator create a string consisting of a number of 1 characters equal to the length of the match. (e.g. If 100 is matched, replace with 111. If 1000 is matched, replace with 1111. etc.)

I don't know R in the least. Here's how it's done in some other languages in case that helps:

Perl:

$s =~ s{10+}{ "1" x length($&) }ger

Python:

re.sub(r'10+', lambda match: '1' * len(match.group()), s)

JavaScript:

s.replace(/10+/g, function(match) { return '1'.repeat(match.length) })

JavaScript (ES6):

s.replace(/10+/g, match => '1'.repeat(match.length))
ikegami
  • 367,544
  • 15
  • 269
  • 518
0

According to the OP

Later on I wish to turn longer sub-sequences into sequences of 1's, such as 10001 to 11111.

If I understand correctly, the final goal is to replace any sub-sequence of consecutive 0 into the same number of 1 if they are surrounded by a 1 on both sides.

In R, this can be achieved by the str_replace_all() function from the stringr package. For demonstration and testing, the input vector contains some edge cases where substrings of 0 are not surrounded by 1.

input <- c("110101101",
         "11010110001",
         "110-01101",
         "11010110000",
         "00010110001")

library(stringr)
str_replace_all(input, "(?<=1)0+(?=1)", function(x) str_dup("1", str_length(x)))
[1] "111111111"   "11111111111" "110-01111"   "11111110000" "00011111111"

The regex "(?<=1)0+(?=1)" uses look behind (?<=1) as well as look ahead (?=1) to ensure that the subsequence 0+ to replace is surrounded by 1. Thus, leading and trailing subsequences of 0 are not replaced.

The replacement is computed by a functions which returns a subsequence of 1 of the same length as the subsequence of 0 to replace.

Uwe
  • 41,420
  • 11
  • 90
  • 134