0

I have a quiet long regex and sometimes it response fast some times it loads long like crazy.

here is my regex:

<div class=""rwResult bg"">.*?mp3/d/[^>]+>(?<Name>[^<]+)</a>.*?artist:[^>]+>(?<Artist>[^<]+).*?user</span>[^>]+[^""]+""(?<Uploader>[^""]+).*?category:.*?"">.*?"">(?<Category>[^<]+).*?time: (?<Duration>[^ ]+) \| (?<StreamSize>[0-9]+) (?<Weight>[^ ]+) \| listened: (?<Clicks>[0-9]+).*?<a href=""(?<DownloadLink>http://dl[^""]+)

rather than use alot of regex for each group i prefer doing one time regex. Is there any function that i could check or avoid the long load while the regular expression is executing ?

I'm working C# or F# hope anyone could answer this problem.

thanks.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
Juidan Ho
  • 92
  • 9
  • 2
    You might be interested in this article on catastrophic backtracking (http://www.regular-expressions.info/catastrophic.html), which specifically documents some of the nasty side-effects of the `.*?` quantifier. – Juliet Dec 11 '10 at 20:49
  • thanks everyone. The website was great . Helped me through alot ^^" – Juidan Ho Dec 16 '10 at 12:40

2 Answers2

2

It looks like you are trying to parse an XML document using a regular expression. This is not really an optimal approach. My guess is that you are seeing problems because of the use of backtracking in your regular expression.

You could try to rewrite your regular expression, but XML is not a regular language and thus is not parsable by regular expressions.

Take a look at the document How to read XML from a file by using Visual C# to get started.

Sidenote: For an entertaining read on what happens when trying to parse a non regular language using regular expression see this Stack Overflow question.

Community
  • 1
  • 1
martineno
  • 2,623
  • 17
  • 14
1

I think you're using the wrong tool. You really want Xpath, and possibly XSLT. The only time you want to use a regex to parse raw XML is when the XML is suspected to be syntactically broken in predictable ways.

Seriously, look at Xpath - it's magic for delving into the structure of XML documents and pulling out the bits you want.

Steve Bennett
  • 114,604
  • 39
  • 168
  • 219