1

I am attempting to build a find and remove type functionality with regex. I was able to get to the point where I could remove strings, but discovered that strings with characters could not be removed. When, using a normal string such as var word = "is", it seems to work fine until it encounters a ., and then I get strange unwanted output.

Some other unwanted occurrences also developed when incorporating characters into the strings I wanted to remove, for example (note that var word = "is." and not is in the code below:

var myarray = ["Dr. this is", "this is. iss", "Is this IS"]
var my2array = []
var word = "is."

//var regex = new RegExp(`\\b${word}\\b`, 'gi');
var regex = new RegExp('\\b' + word + '\\b', 'gi');

for (const i of myarray) {
    var x = i.replace(regex, "")
    my2array.push(x)
}
myarray = my2array
console.log(myarray)

["Dr. this is", "this is. ", "this IS"]

This ^ is wrong in several ways (for some reason iss is gone, is. remains - which was the main string I was trying to remove, the first is in the last index is gone...)

I.E. my desired output in this case would be ["Dr. this is", "this iss", "Is this IS"]

I also tried using template literal, as can be seen in my commented out code.

The goal is to simply remove whatever might be the value in var word from my array. Whether the value be a regular string, a string with characters, or just characters. (And of course within the framework of the breaks I have).

isherwood
  • 58,414
  • 16
  • 114
  • 157
find_all
  • 197
  • 3
  • 15
  • 2
    `.` matches any character in a regex, and it’s also not a word character. You can escape it as `\.` (i.e. `"is\\."`) to make it match a literal period instead, but the word boundary will still be before it, not after. – Ry- Jul 10 '19 at 17:50
  • 1
    Be aware of the XY problem phenomenon. You shouldn't assert a possible solution and necessarily reduce your chances of the best outcome. Take anubhava's answer, for example. – isherwood Jul 10 '19 at 17:58
  • 1
    Not related to the question but you could also use [`.map`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map) rather than a for+push. `myarray = myarray.map(str => str.replace(regex, ''));` – Bali Balo Jul 10 '19 at 18:19
  • 1
    You need to escape the regex special characters in the string `word`. See: [MDN Regular Expressions - Escaping](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Escaping) and [Escape string for use in Javascript regex](https://stackoverflow.com/q/3446170/3982562) – 3limin4t0r Jul 10 '19 at 18:31

1 Answers1

1

There are few issues in regex approach:

  1. . is special regex character that needs to be escaped in your word
  2. Word boundary \b will not be matched after .

You may use this regex based solution:

var myarray = ["Dr. this is", "this is. iss", "Is this IS"]
var my2array = []

var word = "is."

// using lookahead and lookbehind instead of word boundary   
var regex = new RegExp('\\s*(?<!\\S)' +
          word.replace(/\W/g, "\\$&") + '(?!\\S)\\s*')

for (const i of myarray) {
    var x = i.replace(regex, " ")
    my2array.push(x)
}

myarray = my2array
console.log(myarray)
  • .replace(/\W/g, "\\$&") will escape all non-word characters in given word.
  • (?<!\S) is negative lookbehind to assert that previous character is not a non-space character
  • (?!\S) is negative lookbehind to assert that next character is not a non-space character
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Unfortunately this will not work. If I change my `var word = "is"`, it gets rid of the `is` in `this.` – find_all Jul 10 '19 at 17:59
  • Yes I know and I have posted a regex solution for that purpose. – anubhava Jul 10 '19 at 18:00
  • Thank you, I am on the verge of accepting this answer, but I have one more question: this seems to work even if the `\\.` is not escaped in `var word = "is\\."` For example `var word = "is."`. I am planning on having user input for `var word`, so it would save me trouble if I didn't need to alter the `var word` string in order to make it work. Will this present a problem for me later or is the escaping in `var word` redundant? – find_all Jul 10 '19 at 18:09
  • Escaping is needed when you get something like `is(` from user which if left unescaped will cause regex engine error – anubhava Jul 10 '19 at 18:14
  • 1
    Check updated answer for runtime escaping of special characters in regex. – anubhava Jul 10 '19 at 18:18