2

I am new to C# and Windows Phone development so forgive me if I am missing the obvious:

I would like to display a thumbnail image from an RSS XML feed located at http://blog.dota2.com/feed/. The image is inside a CDATA tag written in HTML. Here is the XML code:

    <content:encoded>
<![CDATA[
<p>We celebrate Happy Bear Pun Week a day earlier as Lone Druid joins Dota 2&#8242;s cast of heroes.</p> <p><a href="http://media.steampowered.com/apps/dota2/posts/LoneDruid_full.jpg "><img class="alignnone" title="The irony is that he's allergic to fur." src="http://media.steampowered.com/apps/dota2/posts/LoneDruid_small.jpg" alt="The irony is that he's allergic to fur." width="551" height="223" /></a></p> <p>Community things:</p> <ul> <li><a href="http://www.itsgosu.com/game/dota2/articles/ig-monthly-madness-invitational-finals-mar-29_407" target="_blank">It&#8217;s Gosu&#8217;s Monthly Madness</a> tournament finals are tomorrow, March 29th. You don&#8217;t want to miss this, we hear it could be more than we can bear.</li> <li>Bear witness to <a href="http://www.team-dignitas.net/articles/blogs/DotA/1092/Dota-2-Ultimate-Guide-to-Warding/" target="_blank">Team Dignitas&#8217; Ultimate Guide to Warding</a>. This should be required teaching in clawsrooms across the globe.</li> <li>Great Explorer Nullf has <a href="http://nullf.deviantart.com/#/d4ubxiu" target="_blank">compiled the eating habits</a> of the legendary Tidehunter in one handy chart. This might give you paws before deciding to head to the beach.</li> </ul> <p>Bear in mind that there will not be an update next week as we will be hibernating during that time.</p> <p>Today&#8217;s bearlog is available <a href="http://store.steampowered.com/news/7662" target="_blank">here</a>.</p> <p>&nbsp;</p> <p>Bear.</p>
]]>
</content:encoded>

I just need the <img src="http://media.steampowered.com/apps/dota2/posts/LoneDruid_small.jpg" /> so I can use the URL to display the image in my reader app.

I have heard people saying not to use Regex as it is bad practise for parsing HTML. I am creating this as a proof of concept, and don't need to worry about this. I am looking for the quickest way to get this URL for the image, and then call this in my app.

Does anyone have any help? Thanks in advance, Tom

Tom Gantzer
  • 33
  • 1
  • 5

3 Answers3

1

You can try this when you are ready to use HtmlAgilityPack

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(yourstring);
var imgLinks = doc.DocumentNode
    .Descendants("img")
    .Select(n => n.Attributes["src"].Value)
    .ToArray();
L.B
  • 114,136
  • 19
  • 178
  • 224
1

Assuming your xml looks like this (which I'm sure it doesn't), and these extensions: http://searisen.com/xmllib/extensions.wiki

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:content='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'>
  <content:encoded>
    <![CDATA[
<p>We celebrate ...</p> 
<p>
  <a href="http://media.steampowered.com/apps/dota2/posts/LoneDruid_full.jpg ">
    <img class="alignnone" title="The irony is that he's allergic to fur." 
        src="http://media.steampowered.com/apps/dota2/posts/LoneDruid_small.jpg" />
  </a>
</p> 
<p>the rest removed</p> 
]]>
  </content:encoded>
</root>

This will get the image source from the second paragraph - hard coded and ugly, but it was all you needed you said. You will have to give the path to the path/to/content:encoded for it to work, and if it is in a group (aka array) then it will be even more complicated. From my code you can see how to separate out the arrays (see paras):

XElement root = XElement.Load(file) // or .Parse(string)
string html = root.Get("content:encoded", string.Empty).Replace("&nbsp", " ");
XElement xdata = XElement.Parse(string.Format("<root>{0}</root>", html));
XElement[] paras = xdata.GetElements("p").ToArray();
string src = paras[1].Get("a/img/src", string.Empty);

PS this works because the HTML is properly formed, if it isn't, then you'll have to use the HtmlAgilityPack as others have answered. You can use the html returned from the Get("content:emcoded" ...)

Chuck Savage
  • 11,775
  • 6
  • 49
  • 69
0
const string pattern = @"<img.+?src.*?\=.*?""(<?URL>.*?)""";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
var match = regex.Match(myCDataText);
var domain = match.Groups["URL"].Value;
m.rufca
  • 2,558
  • 2
  • 19
  • 26
David Brabant
  • 41,623
  • 16
  • 83
  • 111