0

I am trying to pull some info out of a text file. I am able to match what I need, the problem is that there are too many matches.

The information repeats itself a few times in the text. There is unique text between repeats, but I can't figure out how to get it to stop matching things when it comes across this text. Putting anything but \s after my lookahead seems to break the regex.

Hoping there is a way to do this, and failing that, a way to limit the amount of matches it will grab.

Here is what I have now and a sample of what I'm searching:

                  (?<=anniversary\s|\s<plaintext>).+(?=\s+)



<subpod title=''>
   <plaintext>birth of Gustav Schäfer (1988- ): 25th anniversary
birth of Arrelious Benn (1988- ): 25th anniversary
birth of Brad Silberling (1963- ): 50th anniversary
birth of Robert Lavette (1963- ): 50th anniversary
Harvard University founded (1636): 377th anniversary
Germany joins the League of nations (1926): 87th anniversary
first Miss America crowned (1921): 92nd anniversary
&quot;Blondie&quot; is first published (1930): 83rd anniversary
Galveston Hurricane of 1900 (1900): 113th anniversary
USAir Flight 427 crashes (1994): 19th anniversary</plaintext>
   <img src='http://www4b.wolframalpha.com/Calculate/MSP/MSP18771b2386h4e5i137b400002gg7ehc7hh7c2h17?MSPStoreType=image/gif&amp;s=40'
       alt='birth of Gustav Schäfer (1988- ): 25th anniversary
birth of Arrelious Benn (1988- ): 25th anniversary
birth of Brad Silberling (1963- ): 50th anniversary
birth of Robert Lavette (1963- ): 50th anniversary
Harvard University founded (1636): 377th anniversary
Germany joins the League of nations (1926): 87th anniversary

Any help appreciated

KitWasHere
  • 11
  • 1
  • 5
  • 1
    Please, [do not try parsing XML with regex](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg). This is [another answer](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) covering this. – mvp Sep 09 '13 at 03:41
  • You should mention exactly what you're trying to capture. Is it everything between the tags? Do the tags repeat? Are you parsing a simple html fragment or complete html file?</plaintext></span> –&nbsp;<a href="../../users/1237040/ravi-k-thapliyal" title="51,095 reputation" class="comment-user ">Ravi K Thapliyal</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/18691156/regex-limit-number-of-matches-stop-matching-things-here#comment27534655_18691156"><span title="2013-09-09T03:42:15.160 License: CC BY-SA 3.0" class="relativetime-clean">Sep 09 '13 at 03:42</span></a></span> </div> </div> </li> <li id="comment-27534749" class="comment js-comment " data-comment-id="27534749" data-comment-owner-id="1850609" data-comment-score="1"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> <span title="number of 'useful comment' votes received" class="warm">1</span> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment27534749_18691156"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">@mvp Since when "I am trying to pull some info out of a text file." means "I'm trying to parse a XML file with regex"? Did you even read the question before pasting that all too useful link?</span> –&nbsp;<a href="../../users/1850609/acdcjunior" title="132,397 reputation" class="comment-user ">acdcjunior</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/18691156/regex-limit-number-of-matches-stop-matching-things-here#comment27534749_18691156"><span title="2013-09-09T03:52:23.913 License: CC BY-SA 3.0" class="relativetime-clean">Sep 09 '13 at 03:52</span></a></span> </div> </div> </li> <li id="comment-27534915" class="comment js-comment " data-comment-id="27534915" data-comment-owner-id="2407242" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment27534915_18691156"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">Yes everything between plaintext tags, well, every entry between them. "birth" or whatever it is to "anniversary" should be one match. Was hoping there was some sort of curly brackets equivalent for the entire pattern</span> –&nbsp;<a href="../../users/2407242/kitwashere" title="11 reputation" class="comment-user owner">KitWasHere</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/18691156/regex-limit-number-of-matches-stop-matching-things-here#comment27534915_18691156"><span title="2013-09-09T04:09:36.650 License: CC BY-SA 3.0" class="relativetime-clean">Sep 09 '13 at 04:09</span></a></span> </div> </div> </li> <li id="comment-27535170" class="comment js-comment " data-comment-id="27535170" data-comment-owner-id="485406" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment27535170_18691156"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">What's your language?</span> –&nbsp;<a href="../../users/485406/christophe" title="27,383 reputation" class="comment-user ">Christophe</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/18691156/regex-limit-number-of-matches-stop-matching-things-here#comment27535170_18691156"><span title="2013-09-09T04:35:04.790 License: CC BY-SA 3.0" class="relativetime-clean">Sep 09 '13 at 04:35</span></a></span> </div> </div> </li> <li id="comment-27537034" class="comment js-comment " data-comment-id="27537034" data-comment-owner-id="1734130" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment27537034_18691156"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">@KitWasHere: your 'text file' example is obviously XML or HTML, that's why</span> –&nbsp;<a href="../../users/1734130/mvp" title="111,019 reputation" class="comment-user ">mvp</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/18691156/regex-limit-number-of-matches-stop-matching-things-here#comment27537034_18691156"><span title="2013-09-09T06:40:22.373 License: CC BY-SA 3.0" class="relativetime-clean">Sep 09 '13 at 06:40</span></a></span> </div> </div> </li> <li id="comment-27550607" class="comment js-comment " data-comment-id="27550607" data-comment-owner-id="2407242" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment27550607_18691156"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">Okay.... So pretend this isn't HTML. Is what I'm asking possible? I'm not having problems dealing with tags or anything, and I am matching what I want to, just too many times</span> –&nbsp;<a href="../../users/2407242/kitwashere" title="11 reputation" class="comment-user owner">KitWasHere</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/18691156/regex-limit-number-of-matches-stop-matching-things-here#comment27550607_18691156"><span title="2013-09-09T14:40:20.710 License: CC BY-SA 3.0" class="relativetime-clean">Sep 09 '13 at 14:40</span></a></span> </div> </div> </li> </ul> </div> </div> </div> </div> <div id="answers"> <a name="tab-top"></a> <div id="answers-header"> <div class="answers-subheader grid ai-center mb8"> <div class="grid--cell fl1"> <h2 class="mb0" data-answercount="9">0 Answers<span style="display:none;" itemprop="answerCount">0</span></h2> </div> </div> </div> </div> </div> </div> </div> <script src="../../static/js/stack-icons.js"></script> <script src="../../static/js/fromnow.js"></script> </body> </html>