17

want to match word i.v. case insensitive

have pattern

(?i)\bi\.v\.

but want a word boundary on the end
the above pattern fails in that it matches
i.v.x

but if I try and add a work boundary to the end

(?i)\bi\.v\.\b

it fails in that it does not even match i.v. as I think the \b is eating the literal . as . is a word break
need the \. to be greedy

i want to match
sam i.v. sam

do not want to match
sam.i.v.
i.v.sam

This get closer

(?i)\bi\.v\.\s$

But it fails to find i.v. at the end of a line

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • 1
    What is your problem? Why do you want a `\b` at the end of the expression? What can follow this `i.v.` string when it's allowed to match? – Qtax Aug 01 '13 at 21:38
  • @Qtax because I only want a word match. The first pattern will match i.v.x. – paparazzo Aug 01 '13 at 21:42
  • Do you want to match "i.v.x", but not match "xxi.v.x"? What about "i.v. x" (with a space between the . and the x)? – Jim Mischel Aug 01 '13 at 21:46
  • @JimMischel yes I want to match space. That is what I meant by word. I should have been more clear. – paparazzo Aug 01 '13 at 21:48
  • Why don't you want to find `sam.i.v.`? Because there is no space before `i.v.`? Then word boundaries are the wrong tool for this. – Tim Pietzcker Aug 01 '13 at 22:05
  • @TimPietzcker Yes and no. Typically I want :.? to be a word boundary so I can pick up words next to punctuation. Since this had a . in it needed special treatment. – paparazzo Aug 01 '13 at 22:23

4 Answers4

27

\b only matches between an alphanumeric character and a non-alphanumeric character (or the start/end of string). Therefore, it doesn't match after a ., unless an alphanumeric character immediately follows that dot.

If your intent is to make sure that no non-whitespace character follows after the dot, then you can specify that using a negative lookahead assertion:

(?i)\bi\.v\.(?!\S)

(?!\S) means "Assert that the next character is not a non-whitespace character".

This may sound a bit convoluted - why the double negative? Why not (?=\s) which means "Assert that the next character is a whitespace character"? Well, there is a subtle difference: The second version requires a whitespace character to be there; that means the regex would fail to match at the end of the string. The first regex handles that corner case as well.

If you generally want the concept of "word boundary" to mean "space-delimited", then you need to replace the first \b as well:

(?i)(?<!\S)i\.v\.(?!\S)

or the regex will match sam.i.v. which you don't seem to want it to.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Your recent edit to your question confused me a bit (I only read it after I had written my answer) - I've commented on your question, could you look at it? I think you need to replace the first word boundary as well... – Tim Pietzcker Aug 01 '13 at 22:06
  • I agree. \b matches on ?i.v. – paparazzo Aug 01 '13 at 22:16
  • I have no clue how it works, but I am happy that GOD made guys like you. Thanks TIM. – quest May 12 '21 at 10:03
2

About your current regex:

You don't need to have \b after dot since dot is not considered a word character but of course dot needs to be escaped:

(?i)\bi\.v\.

But you do need \b before i to make sure it doesn't match e.g. hi

EDIT: (Based on your further edits)

Try this regex:

(?i)\bi\.v\.(?=\s|$)
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

you can also have the boundry in place of the last dot.

(?i)\bi\.v\b

only drawback is that it will also match i.v

Benedict Harris
  • 184
  • 1
  • 9
-3

You seems to be very confuse with word boundaries and greedy notions. The best thing you can do is to go to these addresses:

  • what is a greedy quantifier:

http://www.regular-expressions.info/repeat.html

  • what is a word boundary:

http://www.regular-expressions.info/wordboundaries.html

When you will read these explanations, I am sure you will think that your problem was ridiculous.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125