0

I have a question.

How can I extract an url from an rss-feed?

The string which I need to extract is something like this:

><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="screen2" src="http://hereisthelink/screen2.png" alt="screen2" width="261" height="434" border="0" />

This is on the rss feed of my self hostet wordpress-blog, within the <content:encoded> section.

I want to fetch the first Image of an entry to get it together with the title (this works) in my ListBox.

However I tried many things to achieve this, but nothing works.

I am working with the Syndication.dll of Silverlight 3 to extract the feed items.

At the moment I am standing really in front of a wall for this to solve.

I am open to any suggestions.

MSicc
  • 329
  • 3
  • 10
  • If you can convert your rss feed to a string on the client side then you only need to parse it to get your Image from it. – BigL Oct 29 '11 at 09:18
  • Maybe this link helps you : http://stackoverflow.com/questions/319591/reading-non-standard-elements-in-a-syndicationitem-with-syndicationfeed – BigL Oct 29 '11 at 09:25
  • i checked your link. this is not what I am searching for. I want to collect a random generated url to an image, to bind an image to this url. currently I am playing around with RegEx to solve this problem. Any idea for this? Thx in advance – MSicc Oct 29 '11 at 12:23
  • That would have been my third option but didn't mention it because it is not easy to understand and create Regex expressions. I never really went deep into Regex so i only can give you links where you can start to learn it. http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx – BigL Oct 29 '11 at 18:21
  • that is so right. Regex is really mighty, but also complex. – MSicc Oct 29 '11 at 20:00

2 Answers2

0

You can use HTML Agility pack http://htmlagilitypack.codeplex.com/ There's a version for Windows Phone (HAPPhone in the trunk). After getting a Document from the content of the post you can get the first img element child of them.

var firstimage = document.DocumentNode.Descendants("img").FirstOrDefault();
MatthieuGD
  • 4,552
  • 2
  • 30
  • 29
  • that sounds interesting. I am currently looking into it, but at codeplex there is no documentation about it. Have to search a bit more i think. – MSicc Oct 29 '11 at 20:00
0

Something like this should work for you:

var document = XDocument.Parse(html);
var items = new List<Item>();
var channel = (XContainer) document.Root.FirstNode;
foreach (XElement item in channel.Nodes())
{
    try
    {
        var item = new Item();
        var nodes = item.Nodes().ToArray();
        foreach (XElement keyValue in nodes)
        {
            var value = keyValue.Value.Trim('\r', '\t', '\n', ' ').ToLower();
            switch (keyValue.Name.LocalName)
            {
                case "title": item.Title = value; break;
                case "content:encoded": item.Content = value; break;

                // TODO: add more fields
            }
        }

        var match = Regex.Match(item.Content, "<img(.*?) src=\"(.*?)\"[^>]*>");
        item.FirstImageUrl = match.Groups[2].Value;
    }
    catch
    {
        // TODO: handle exception
    }
}
return items; 

You only have to finish the switch statement and create the Item class.

Rico Suter
  • 11,548
  • 6
  • 67
  • 93
  • ok, this uses the html which I posted above. But if I change the variable to my rss-feed url, this does nothing. Any idea? – MSicc Oct 30 '11 at 18:55
  • Possible problems: The regex is case sensitive, perhaps you have `IMG` instead of `img` tags. Perhaps you use ' instead of ". – Rico Suter Oct 31 '11 at 10:12
  • Hi, the url is http://msicc.net/?feed=rss2. the other problem is that I need the first image of each post, which is in the field (each post has this field). thx for your help – MSicc Oct 31 '11 at 18:05
  • Your html is wrong... image tag has no /> at the end. But I've changed the regex. – Rico Suter Oct 31 '11 at 19:27