1

I'm trying to pull text out of a word document using regex look ahead and look behind foudn in this answer:

Regular Expression to find a string included between two characters while EXCLUDING the delimiters

The delimeters I have to work with are

Start: RQ

End: END-RQ

I have added the following (powershell) code:

$regex = [regex] '(?<=RQ)(.*?)(?=END-RQ)' 

$matches = $regex.Matches($concat) 

The problem is the matching is grabbing the RQ from END-RQ as the beginning of the next pattern. Can anyone tell me how to eliminate that (e.g. force the regex to match exactly RQ and END-RQ)? Wrapping the matching patterns in quotes does not seem to work, even when the quotes are escaped.

Community
  • 1
  • 1
Dinsdale
  • 617
  • 1
  • 7
  • 8
  • in this particular case and assuming that all your "RQ...END-RQ" are balanced, isn't it more simple (and probably faster) to use: `RQ((?>[^E]+|E(?!ND-RQ))*)END-RQ` and then extract the capturing group (since you use a capturing group)? – Casimir et Hippolyte Nov 25 '13 at 19:30
  • The groups are gauraunteed to be balanced. Thanks for pointing this out though. – Dinsdale Nov 29 '13 at 03:42

3 Answers3

5

Try this:

$regex = [regex] '(?<=(?<!END-)RQ)(.*?)(?=END-RQ)'
King King
  • 61,710
  • 16
  • 105
  • 130
0

you should download this application:

http://www.sellsbrothers.com/posts/Details/12425

it is priceless when trying to debug regex.

tecshack
  • 108
  • 7
0

This might work (hard to say without knowing exactly what your data is):

$regex = [regex]'(?<=(?:^|[^-])RQ)(.*?)(?=END-RQ)'
mjolinor
  • 66,130
  • 7
  • 114
  • 135