1

I have an instance in a project where I need to match some text within a description or title;

The requirements for matching are as follows;

Should Match:

a) Any occurrences of "Volume" OR "Part" (Case Insensitive);

b) Any occurrence of "vol" or "pt" (CI) that does not have [[comma][space] before AND [period] after;

I have tried numerous different regex patterns (count down from the 4) on regex101 here: http://regex101.com/r/lO9vO9/4

In that link, theres a few lines that fail, that I would like to, ideally, match.

. pt. as it contains the pt with trailing period, but has the wrong character (period) before it when expecting a comma

The Red Pill, Pt 2 As it contains the preceding comma and the PT, but misses the period after PT.

If someone can help me with this, I would appreciate it if a run down of how it works was available too - so I can figure out where I went wrong.

Community
  • 1
  • 1
MrMarlow
  • 856
  • 4
  • 17

2 Answers2

2

You can use this regex:

(,\s(?:vol|pt)\.(*SKIP)(*F)|\b(?:volume|pt|vol|part)\b)

RegEx Demo

This part ,\s(?:vol|pt)\. just matches your negative pattern and (*SKIP)(*F) just skips it from final match.

More info on (*SKIP) and (*FAIL)

Community
  • 1
  • 1
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • "volume" should come before "vol" in the last part, otherwise "volume" won't be found. (*SKIP) and (*F) are new to me. Where can I get info on them? – Pedro Gimeno Nov 26 '14 at 19:19
  • Ok that regex is updated, **[see this Q&A for more info on these features](http://stackoverflow.com/questions/19992984/verbs-that-act-after-backtracking-and-failure)** – anubhava Nov 26 '14 at 19:39
1

So, in other words you want to forbid, pt & vol not followed by a dot, and pt & vol not preceded by a comma and a space:

volume|part|(?<!, )(?:vol|pt)|(?:vol|pt)(?!\.)

demo

Note: you can improve this pattern by adding a lookahead and word boundary at the begining (in this way, the alternation is only tested for words that begin with p and v). You can check too that "vol" or "pt" is not the begining of another word by forcing that no letters follow.

(?=\b[pv])(?:volume|part|(?<!, )(?:vol|pt)|(?:vol|pt)(?!\.))(?![a-z])
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125