0

Alright so basically what I am having trouble with is this: I have a string that looks like this

<a class="l _HId" href="http://www.cnbc.com/2016/07/28/royal-dutch-shell-second-quarter-net-profit-comes-in-at-118-billion.html" onmousedown="return rwt(this,&#39;&#39;,&#39;&#39;,&#39;&#39;,&#39;1&#39;,&#39;AFQjCNHzDJMd9KzNaZJKrec-FAMNdP8ujw&#39;,&#39;mb_qkV1ZFbNFLJBX-JNceA&#39;,&#39;0ahUKEwiwmbPolJbOAhVJ6xQKHT1QDFkQqQIIGigAMAA&#39;,&#39;&#39;,&#39;&#39;,event)">Shell sees quarterly profits plummet 70% as low <em>oil price</em> bites</a>

which, simplified, looks like that essentially:

<a class="l _HId" href="Link" onmousedown="some gibberish">The String that I need <em>I am guessing this is what I searched</em> bites</a>

It would have been very helpful if someone new how something like that could be achieved. Any form of help is much appreciated.

Thanks in advance.

rory.ap
  • 34,009
  • 10
  • 83
  • 174
Panos
  • 1,764
  • 21
  • 23
  • You need to show us what you tried before we help you. – rory.ap Jul 28 '16 at 12:50
  • But do you want to remove the "gibberish"? or do you want to get "The String that I need"? – Pau C Jul 28 '16 at 12:59
  • I believe [this](http://stackoverflow.com/a/1732454/2465194) may apply? – rtmh Jul 28 '16 at 13:04
  • But then again, [this](http://stackoverflow.com/a/1733489/2465194) post makes a good point that my be more applicable to this situation. – rtmh Jul 28 '16 at 13:06
  • I want to get "The string that I need". And because I am new to C# (I usually write code in C++) I don't know much and only tried this : MatchCollection m1 = Regex.Matches(html, @"./s*(.+?)/s*", RegexOptions.Singleline); – Panos Jul 28 '16 at 13:19
  • @PanosPtr Perhaps for future reference (as you seems to have found an answer), put this information upfront into the question. Even considering the HTML-Regex problem, you may have been more likely to get useful answers. Thorough questions get thorough answers. Thanks for the contribution though and I'm glad you found at least some answer. – rtmh Jul 28 '16 at 15:00

1 Answers1

1

Indeed this link post answered all my questions:

RegEx match open tags except XHTML self-contained tags

It looks like parsing html with RegeX isn't the best Idea.

Community
  • 1
  • 1
Panos
  • 1,764
  • 21
  • 23