14

I am looking for some word boundary to cover those 3 cases:

  1. beginning of string
  2. end of string
  3. white space

Is there something like that since \b covers also -,/ etc.?

Would like to replace \b in this pattern by something described above:

(\b\d*\sx\s|\b\d*x|\b)
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Marcin
  • 5,469
  • 15
  • 55
  • 69

3 Answers3

28

Try replacing \b with (?:^|\s|$)

That means

(
  ?: don't consider this group a match
  ^   match beginning of line
  |   or
  \s  match whitespace
  |   or
  $   match end of line
)

Works for me in Python and JavaScript.

Udo Held
  • 12,314
  • 11
  • 67
  • 93
Michael
  • 281
  • 1
  • 3
  • 2
  • In Javascript, the outer group includes everything, including the non capturing group. Using exec() I can get the 2nd group, which excludes the non capturing group, but how can I use this in replace() in Javascript? – Fernando Camargo Mar 15 '13 at 13:35
  • Works perfectly in .net - thanks! That is exactly what I'd been looking for. (The regex I'm trying to match between whitespace/beginning/end of line is different, but the "match whitespace or beginning or end of line" behavior is exactly what I was looking for.) – neminem Oct 24 '13 at 14:55
  • @FernandoCamargo I am trying to figure out the same thing using replace() in javascript – Kdawgwilk May 27 '15 at 19:25
  • @Kdawgwilk did you find a solution? I'm facing exactly the same problem – Jordane Jul 22 '16 at 16:23
  • @Jordane I will see if I can find it tomorrow and I'll let you know – Kdawgwilk Aug 04 '16 at 05:01
  • 1
    This also matches the whitespace before and after it. How can i make it only match the word and not including the white spaces? – DavidNyan10 Feb 21 '22 at 06:45
13

OK, so your real question is:

How do I match a unit, optionally preceded by a quantity, but only if there is either nothing or a space right before the match?

Use

 (?<!\S)\b(?:\d+\s*x\s*)?\d+(?:\.\d+)?\s*ml\b

Explanation

(?<!\S): Assert that it's impossible to match a non-space character before the match.

\b: Match a word boundary

(?:\d+\s*x\s*)?: Optionally match a quantifier (integers only)

\d+(?:\.\d+)?: Match a number (decimals optional)

\s*ml\b: Match ml, optionally preceded by whitespace.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Maybe if you could explain what you really want to do. What problem are you trying to solve? – Tim Pietzcker Oct 26 '10 at 17:29
  • I have posted my pattern above, just wanted to replace \b by this limited word boundry if possible – Marcin Oct 26 '10 at 17:34
  • This is not what "explaining what problem you're trying to solve" means. What do you need the regex *for*? Show some examples of what you're trying to match, what you're trying *not* to match, and possibly any further "rules" the matches have to follow. – Tim Pietzcker Oct 26 '10 at 17:43
  • looks nice but not matching floats i.e. 1.50 ml – Marcin Oct 26 '10 at 18:08
  • 2
    OK, changed the regex to accomodate that. You should know and state your requirements clearly - you hadn't said anything about floats before... – Tim Pietzcker Oct 26 '10 at 18:18
1

Boundaries that you get with \b are not whitespace sensitive. They are complicated conditional assertions related to the transition between \w\W or \W\w. See this answer for how to write your anchor more precisely, so that you can deal with whitespace the way you want.

Community
  • 1
  • 1
tchrist
  • 78,834
  • 30
  • 123
  • 180