0

Based off Regex Until But Not Including, I'm trying to match all characters up until a word boundary.

For example - matching apple in the following string:

apple<

I'm doing that using:

Like this:

/a[^\b]+/

Which should look for an "a" and then grab one or more matches for any character that is not a word boundary. So I would expect it to stop before < which is at the end of the word

Demo in Regexr

Demo in StackSnippets

var input = [ "apple<", "apple/" ];
var myRegex = /a[^\b]+/;

for (var i = 0; i < input.length; i++) {
  console.log(myRegex.exec(input[i]));  
}

Couple other regex strings I tried:

I can use a negated word boundary or a negated set with a regular word boundary:

  • /a[\B]+/
  • /a[^\b]+/

I can specify several possible word ending characters and use them in a negated set:

  • /a[^|"<>\-\\\/;:,.]+/

I can also look for a postive set and just restrict it to return for regular letters:

  • /a[\w]+/
  • /a[a-zA-Z]+/

But I'd like to know how to do it for a word boundary if that's possible.

Here's a MDN's listing of word boundary and the characters that it constitutes

Community
  • 1
  • 1
KyleMit
  • 30,350
  • 66
  • 462
  • 664

3 Answers3

6

Word boundaries (\b) are not characters, but the empty string between a sequence of letters and any non-letter character. Moreover, since Unicode support is still lacking in JavaScript, "letter" mean only ASCII letters.

Because of that, you

  • generally shouldn't use \b unless your data is some kind of computer language that can't possibly include Unicode
  • can't apply quantifiers to \b (an empty string times 10 is still one empty string)
  • can't negate \b (it's not a character set, so it has no complement)
  • can't include \b in a character set (in square brackets) since, again, it's not a character or character set

Since \b doesn't actually add any characters to the match, you can safely append it to your regex:

/.+?\b/

will match all characters up until the first word boundary. It's in fact a superset of:

/\w+/

which is probably what you want, since you're interested only in the words, not the stuff in between.

Touffy
  • 6,309
  • 22
  • 28
1

You have to include the word boundary as part of your regex like this:

/[A-Za-z]+\b/

Working demo

You could also use:

\w+\b

Although this will include the underscore as part of your word

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
1

If this rewording of the question is accurate: match all words beginning with 'a', then you might have begun the search with existing SO answers like this one. Distilling that down you could use a character class for a word \w and to make it a bit more bulletproof by including a preceding word boundary \b match to prevent matching partial words including an 'a' such as 'baggage': /\ba\w+/gi

var input = [ "apple<", "apple/", "baggage;" ];
var myRegexWord = /\ba\w+/i;
var myRegexPartial = /a\w+/;

for (var i = 0; i < input.length; i++) {
  console.log(myRegexWord.exec(input[i]));  
  console.log(myRegexPartial.exec(input[i]));  
}
Community
  • 1
  • 1
Jason Cust
  • 10,743
  • 2
  • 33
  • 45
  • This doesn't look like it works. OP is asking for everything from "a" to the end of the word. This is only matching words starting with "a" – redbmk Apr 21 '15 at 17:25
  • @redbmk Hence the caveat to start the answer. All of the examples were words beginning with an 'a'. But if that wasn't the case would the OP want the first occurrence, second, third, etc? :) – Jason Cust Apr 21 '15 at 17:26
  • This is baffling. The answer you posted here is a great answer to a different question. You even included a link to the other question, but that question doesn't have an answer using word boundaries, nor an accepted answer. Why not post this answer on that question instead? – redbmk Apr 21 '15 at 17:32
  • @redbmk I'm not sure what is baffling. The question does not explicitly state that the OP wants partial word matches and furthermore as already stated all of the examples provided were words beginning with an 'a'. It's entirely optional to include the boundary if it's needed or not. Why are you not harping on the other provided answers for completely glossing over one of the few stated requirements: *should look for an "a"*. None of them so far include that. – Jason Cust Apr 21 '15 at 17:37
  • Good point about the other answers ignoring the "a". Still, your answer would be a good addition to that other question you mentioned. – redbmk Apr 21 '15 at 17:45