6

I'm using Argotic Syndication Framework for processing feeds.

But the problem is, if I pass a URL to Argotic, which is not a valid feed (for example, http://stackoverflow.com which is a html page, not feed), the program hangs (I mean, Argotic stays in an infinity loop)

So, How to check if a URL is pointing to a valid feed?

Mahdi Ghiasi
  • 14,873
  • 19
  • 71
  • 119

4 Answers4

7

From .NET 3.5 you can do this below. It will throw an exception if it's not a valid feed.

using System.Diagnostics;
using System.ServiceModel.Syndication;
using System.Xml;

public bool TryParseFeed(string url)
{
    try
    {
        SyndicationFeed feed = SyndicationFeed.Load(XmlReader.Create(url));

        foreach (SyndicationItem item in feed.Items)
        {
            Debug.Print(item.Title.Text);
        }
        return true;
    }
    catch (Exception)
    {
        return false;
    }
}

Or you can try parsing the document by your own:

string xml = "<?xml version=\"1.0\" encoding=\"utf-8\" ?>\n<event>This is a Test</event>";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml);

Then try checking the root element. It should be the feed element and have "http://www.w3.org/2005/Atom" namespace:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:re="http://purl.org/atompub/rank/1.0">

References: http://msdn.microsoft.com/en-us/library/system.servicemodel.syndication.syndicationfeed.aspx http://dotnet.dzone.com/articles/systemservicemodelsyndication

Nick Painter
  • 720
  • 10
  • 13
Akira Yamamoto
  • 4,685
  • 4
  • 42
  • 43
  • Since my project in a web application, So I can't trust the header. Someone may give an invalid feed, but valid XML and valid root element, Then my app hang... :( – Mahdi Ghiasi Aug 16 '12 at 22:52
  • Thanks for the update. But a question: `System.ServiceModel.Syndication` supports what feed formats? – Mahdi Ghiasi Aug 16 '12 at 22:54
  • in Atom 1.0 and in RSS 2.0. http://msdn.microsoft.com/en-us/library/system.servicemodel.syndication.syndicationfeed.aspx – Akira Yamamoto Aug 16 '12 at 22:57
  • The advantage of your method is it does not need any web requests to determinate if it is valid or not. But the disadvantage is, Argotic supports much more feed types, but `System.ServiceModel.Syndication` doesn't. – Mahdi Ghiasi Aug 16 '12 at 23:08
  • @AkiraYamamoto A problem with using SyndicationFeed.Load() is that it will often run into dtd errors on "valid" rss and atom feeds. The issue is that the feed might be "invalid" according to the spec, but valid according to the apps that use them, hence SyndicationFeed.Load() eliminates a bunch of good feeds. – Matthew Jul 19 '15 at 14:29
2

you can use Feed Validation Service. It has SOAP API.

Dmitry Khryukin
  • 6,408
  • 7
  • 36
  • 58
  • Thank you. seems that your answer is the best. But can you explain some more about SOAP API? How to contact that API? Is it possible to call that api with GET requests? – Mahdi Ghiasi Aug 16 '12 at 23:06
  • @MahdiGhiasi check this article - http://msdn.microsoft.com/en-us/library/ff512390.aspx if it's not clear I'll create an example later. – Dmitry Khryukin Aug 16 '12 at 23:18
  • It would be nice if you create an example :) Thanks – Mahdi Ghiasi Aug 16 '12 at 23:22
  • @MahdiGhiasi ok. in 9-10 hours I'll be free for this. – Dmitry Khryukin Aug 16 '12 at 23:23
  • There is a limitation in this service: 1 request per second. So I can't make requests to this service from server side. And about client-side, also it is not allowing ajax requests: http://stackoverflow.com/questions/11997256/call-a-external-web-page-cross-domain-with-javascript , Isn't there any way to use this service from client-side? – Mahdi Ghiasi Aug 25 '12 at 07:15
1

You can check the content type. It has to be text/xml. See this question to find the content type.

you can use this code:

var request = HttpWebRequest.Create("http://www.google.com") as HttpWebRequest;
if (request != null)
{
    var response = request.GetResponse() as HttpWebResponse;

    string contentType = "";

    if (response != null)
        contentType = response.ContentType;
}

thanks to the answer of the question

Update

To check if it is a feed address you can use W3C Feed Validation service.

Update2

as BurundukXP said it has a SOAP API. to work with it you can read the answer of this question.

Community
  • 1
  • 1
ahmadali shafiee
  • 4,350
  • 12
  • 56
  • 91
  • 1
    Every XML is not a Feed. Also please read my comment on the other answer. – Mahdi Ghiasi Aug 16 '12 at 22:53
  • @ahmadalishafiee - Your core statement: "It has to be text/xml" is incorrect. First, any response can indicate any content type, so that result alone is not authoritative. Additionally, text/rss+xml is a valid content type for RSS feeds. – Matthew Jun 05 '15 at 17:20
0

If you want to just have it transformed into valid RSS/ATOM, you can use http://feedcleaner.nick.pro/ to have it sanitized. Alternatively, you can fork the project.

Lutz Büch
  • 343
  • 4
  • 12