1

regexp like this:

/<span[^>]*class=\"link[^>]*params=\"(\d+),(\d+),[^>]*>[^<]*from.*?(\d{1,2})(.*?)(\d{4}).*?(year|Year)[^<]*<\/span>/

string like that:

<p id="p_195" class="s_16" style="text-indent:6pt;"><span class="link s_8" params="65537,21403229,0,195,0,0" onmouseover="this.style.textDecoration='underline';" onmouseout="this.style.textDecoration='none';" onclick="return onClickLink(event, this);">Sometext from 28&nbsp;september 2013& nbsp;year</span></p>

The trouble is that &nbsp;september with &nbsp;. There can be a space or &nbsp;. I change regexp to: bla-bla-blah... from.*?(\d{1,2})**(& nbsp;|\s)**(.*?)(\d{4}).*?(year|Year) ...bla-bla-blah

(& nbsp; without a space)

So, in matches I've got a (&nbsp;|\s). But I do not need em there! How to group (&nbsp;|\s) without getting em to matches?

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
user2484836
  • 461
  • 1
  • 4
  • 5
  • 4
    Um... I think you should work on your communication skills. It's hard to understand you. – Jerry Sep 30 '13 at 11:04
  • what we all can understand from your question is blah blah blah :P – Neel Sep 30 '13 at 11:06
  • 1
    **Don't use regular expressions to parse HTML. Use a proper HTML parsing module.** You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php or [this SO thread](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester Sep 30 '13 at 14:11

1 Answers1

1

You want a non-capturing group, try this:

?(\d{1,2})(?:& nbsp;|\s)(.?)(\d{4}).*?(year|Year)

See Kobi's comment to the OP for details. What is a non-capturing group? What does a question mark followed by a colon (?:) mean?

Be careful with non-capturing groups. They are not supported in all regex flavours and can mess up your post-processing code if you you rely on the group backreference indexes and suddenly decide to change a group to be non-capturing. My advice is to always used named groups in .Net.

Community
  • 1
  • 1
Gusdor
  • 14,001
  • 2
  • 52
  • 64