1

I have this piece of text:

W/NNP Yes/NNP Get/NNP Paid/NNP for/IN Going/NNP to/TO College/NNP !/. Check/NNP it/PRP out/RP here/RB !/. http/NN :/: //sldollar.notlong.com/JJ apple/NN iphone/NN TGIF/NNP swine/NN flu/NN

I am currently using this regex to capture some regions of interest:

[a-zA-Z]*/NN[PS]* [a-zA-Z]*/NN[PS]*

I am using RegexPal to test this.

enter image description here

This captures TGIF/NNP swine/NN but not swine/NN flu/NN. Any suggestions on how to fix my regex to capture this?

Legend
  • 113,822
  • 119
  • 272
  • 400

2 Answers2

1

In case anyone else needs this, I guess the answer is to use a positive lookahead:

([a-zA-Z]*/NN[PS]* )(?=([a-zA-Z]*/NN[PS]*))
Legend
  • 113,822
  • 119
  • 272
  • 400
1

Multiple matches cannot overlap.

apple/NN iphone/NN TGIF/NNP swine/NN flu/NN
AAAAAAAAAAAAAAAAAA BBBBBBBBBBBBBBBBB
         CCCCCCCCCCCCCCCCCC DDDDDDDDDDDDDDD

Matches marked A and B above follow each other, but because piece C starts in the middle of match A (and likewise for D within B), they are not matches.

You need to match once and then re-search at some point after the previous starting point, or use lookahead so the latter part isn't consumed.

Anon
  • 43
  • 3