0

I could not find a match to this question.

I have a string like so

var s="one two one-two one-three one one_four"

and my function is as follows

 function replaceMatches( str, word )
    {
      var pattern=new RegExp( '\\b('+word+')\\b','g' )
      return str.replace( pattern, '' )
    } 

the problem is if I run the function like

var problem=replaceMatches( s,'one' )

it

returns  two -two -three one_four"

the function replaces every "one" like it should but treats words with a hyphen as two words replacing the "one" before the hyphen.

My question is not about the function but about the regex. What literal regex will match only the words "one" in my string and not "one-two" or "one-\w"<--you know what I mean lol

basically

var pat=/\b(one)\b/g
"one  one-two one".replace( pat, '')

I want the above ^ to return

" one-two "

only replace the exact match "one" and not the one in "one-two" the "one" on the end is important to, the regex must work if the match is at the very end Thank you, sorry if my question is relatively confusing. I am just trying to get my learn on, and expand my personal library.

user2788832
  • 3
  • 1
  • 3
  • Just use \s instead of \b? – fred02138 Sep 17 '13 at 19:03
  • You can try a negative lookahead: `/\bone(?!\S)/`. If `one` is the latter part of the word (none of your examples had that) you will need some tricks to simulate lookbehind. – Bergi Sep 17 '13 at 19:03
  • @Bergi: `two-one` --> fail. – nhahtdh Sep 17 '13 at 19:04
  • @fred02138: But that would not match `one` in the end of the string, and also remove that whitespace… – Bergi Sep 17 '13 at 19:04
  • `\b` is a word boundary. It means "not a word character". Word characters (`\w`) are `[A-Za-z0-9_]`. The hyphen is not a word character, so it's treated as a word boundary. – gen_Eric Sep 17 '13 at 19:05
  • 2
    `s.replace(/\bone(?![\w-])/g, "*")` ? –  Sep 17 '13 at 19:06
  • @RocketHazmat: "not a word character" is `\W`, specifically "a character that is not a word character". A word boundary is [something different](http://www.regular-expressions.info/wordboundaries.html) - a non-consuming zero-length match – Bergi Sep 17 '13 at 19:07
  • @Bergi: `Between two characters in the string, where one is a word character and the other is not a word character.` I know they are different, but I was trying to explain it simply. – gen_Eric Sep 17 '13 at 19:08

3 Answers3

1

What do you considered to be a word?

A word is a sequence of 1 or more word characters, and word boundary \b is defined based upon the definition of word character (and non-word character).

Word character as defined by \w in JavaScript RegExp is shorthand for character class [a-zA-Z0-9_].

What is your definition of a "word"? Let's say your definition is [a-zA-Z0-9_-].

Emulating word boundary

This post describes how to emulate a word boundary in languages that support look-behind and look-ahead. Too bad, JS doesn't support look-behind.

Let us assume the word to be replaced is one for simplicity.

We can limit the replacement with the following code:

inputString.replace(/([^a-zA-Z0-9_-]|^)one(?![a-zA-Z0-9_-])/g, "$1")

Note: I use the expanded form [a-zA-Z0-9_-] instead of [\w-] to avoid association with \w.

Break down the regex:

(
  [^a-zA-Z0-9_-]  # Negated character class of "word" character
  |               # OR
  ^               # Beginning of string
)
one               # Keyword
(?!               # Negative look-ahead
  [a-zA-Z0-9_-]   # Word character
)

I emulate the negative look-behind (which is (?<![a-zA-Z0-9_-]) if supported) by matching a character from negated character class of "word" character and ^ beginning of string. This is natural, since if we can't find a "word" character, then it must be either a non-"word" character or beginning of the string. Everything is wrapped in a capturing group so that it can be replaced back later.

Since one is only replace if there is no "word" character before or after, there is no risk of missing a match.

Putting together

Since you are removing "word"s, you must make sure your keyword contains only "word" characters.

function replaceMatches(str, keyword)
{
    // The keyword must not contain non-"word" characters
    if (!/^[a-zA-Z0-9_-]+$/.test(keyword)) {
        throw "not a word";
    }

    // Customize [a-zA-Z0-9_-] and [^a-zA-Z0-9_-] with your definition of
    // "word" character
    var pattern = new RegExp('([^a-zA-Z0-9_-]|^)' + keyword + '(?![a-zA-Z0-9_-])', 'g')
    return str.replace(pattern, '$1')
}

You need to escape meta-characters in the keyword if your definition of "word" character includes regex meta-characters.

Community
  • 1
  • 1
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • With `([^a-zA-Z0-9_-]|^)one` you've just opened up a whole range of characters before `one`. These may not be natural delemiters. –  Sep 17 '13 at 19:45
  • @sln: It is the same with `\b` anyway. `\b` is defined based on `[a-zA-Z0-9_]`, so if you use `\bone`, then the character before `one` can be any Chinese character, control characters, spaces, etc. – nhahtdh Sep 17 '13 at 19:51
  • Thats true, but I thought he was moving away from a boundry like condition into simple delimeter(s). Imagine constructing the 'Anti-keyword' class every time if the keyword is dynamic. –  Sep 17 '13 at 20:09
  • @sln: You only have to construct the list of characters that makes up a "word". With the limited support for Unicode in JS, it is a bit of a pain to do this. My solution is intended to be a very general solution, since the OP doesn't indicate what he considers to be a "word". Using pre-built syntax only hides the definition away. – nhahtdh Sep 17 '13 at 20:12
  • well I tried to keep my example simple using a simple string of words. Ideally I would like the function to replace every instance of the word parameter in the string parameter exactly. If I pass "howdy1234" as the word param it should replace every instance that matches "howdy1234" exactly. I want to treat "word-word" as a single word as long as there is no space between characters it should be considered a word. Thank you for the detailed response – user2788832 Sep 17 '13 at 20:34
  • I know replacing an exact match is simple, however the hyphen conundrum made it a lot more difficult – user2788832 Sep 17 '13 at 20:38
  • IMO boundries are generally useless, for educational purposes and not real world. Its always better to say whats expected rather than to isolate a fish in the ocean like this. I wonder what the OP is going to say when it matches `!howdy123`,44<45 . Maybe its not what he expects. Just sayin .. –  Sep 18 '13 at 00:47
  • @sln: Of course it is going to match the `!` before, and leave it alone, while removing the keyword (`howdy123` I assume). Well, word boundary is rather useful if "word" character is defined right (not the case in JS, though) - given a natural text for example - but it is out of the scope of this question. – nhahtdh Sep 18 '13 at 04:14
0

Use this for your RegExp:

function replaceMatches( str, word ) {
  var pattern = new RegExp('(^|[^-])\\b('+word+')\\b([^-]|$)', 'g');
  return str.replace(pattern, '$1$3')
} 

The (^|[^-]) will match either the start of the string or any character except -. The ([^-]|$) will match either a character other than - or the end of the string.

Paul
  • 139,544
  • 27
  • 275
  • 264
  • Thank you for the quick reply!, the regex for this is more complex than I realized. What is the secret to a conditional RegEx such as this? – user2788832 Sep 17 '13 at 19:12
  • can you explain how it knows not to match the hyphen? – user2788832 Sep 17 '13 at 19:13
  • @user2788832 The `[^-]` is a negated character class. It matches any character that is not a hyphen. You could use `\s` instead if you want it to only match the start of the string or spaces: `new RegExp('(^|\s)('+word+')(\s|$)', 'g');` You also don't need the `$1$3`, in that case, but you will want to replace the matched pattern with a space `' '`. – Paul Sep 17 '13 at 19:16
0

I'm not a JS pattern function expert but the function should replace all.

As for the hyphen in 'one-two' between one and - is a word boundry (ie. \b) and the
end of string is a word boundry if a \w character is there before it.

But, it sounds like you may want 'one' to be preceeded with a space or BOL.
([ ]|^)one\b in that case you want to make the replacement capture group 1, thus strippking out 'one' only.

And, I'm not sure how that function call works in JS.

Edit: after new expected output, the regex could be -

([ ]|^)one(?=[ ]|$)

  • Well, your answer is the same as one of the deleted answer. Not sure what OP thinks about this case: `one?two`, but if spaces is the only word-separator, then this is a fine solution. – nhahtdh Sep 17 '13 at 20:15
  • @nhahtdh - Spaces are a good cue/choice for delimeters. In that case `one?two` doesn't match. And I usually don't look at deleted answers, mostly because I never see them. –  Sep 18 '13 at 00:56
  • And there is always that limited whitespace class `[^\S\r\n\f]` or `[\ \t]` that are good delimeters. –  Sep 18 '13 at 01:04