1

I have an XML file that looks like:

  <results>
    <result>
      <title>Welcome+to+The+JASON+Project%21</title>
      <url>http%3A%2F%2Fwww.jason.org%2F</url>
      <domain />
      <inside_links>
        <inside_link>
          <description>News</description>
          <url>http%3A%2F%2Fwww.jason.org%2FPublic%2FNews%2FNews.aspx</url>
        </inside_link>
        <inside_link>
          <description>register</description>
          <url>http%3A%2F%2Fwww.jason.org%2Fpublic%2Fregistration%2Fregistration.aspx</url>
        </inside_link>
        <inside_link>
          <description>Argonauts</description>
          <url>http%3A%2F%2Fwww.jason.org%2FPublic%2FArgonauts%2FArgonauts.aspx</url>
        </inside_link>
        <inside_link>
          <description>Curriculum</description>
          <url>http%3A%2F%2Fwww.jason.org%2FPublic%2FCurriculum%2FCurriculum.aspx</url>
        </inside_link>
        <inside_link>
          <description>Credits</description>
          <url>http%3A%2F%2Fwww.jason.org%2Fpublic%2FMisc%2FCredits.aspx</url>
        </inside_link>
      </inside_links>
      <inside_keywords>National+Science+Education+Standards, National+Geographic+Society, Physical+Science, Professional+Development, Earth+Science</inside_keywords>
    </result>
  </results>

...And I'm very confused as to how to read it. I simply want to get the Title, Description, and URL into separate strings. Something like:

foreach line in lines
string title = gettitle;
string description = getdescription;
string url = geturl;

...I've read so many tutorials but all of them seem to not be relative to what i need to do.. Can somebody please help me out with this?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
jay_t55
  • 11,362
  • 28
  • 103
  • 174

3 Answers3

6

If you are using .NET 3.5, I'd suggest using LINQ to XML...

XDocument doc = XDocument.Load(filename);
XElement insideLinks = doc.Root.Element("result").Element("inside_links");
foreach (XElement insideLink in insideLinks.Elements())
{
    string description = (string)insideLink.Element("description");
    string url = (string)insideLink.Element("url");
}

This also lets you use the built-in "query" syntax so you could do something like this...

XDocument doc = XDocument.Load(filename);
XElement insideLinks = doc.Root.Element("result").Element("inside_links");
var allTitles = from XElement insideLink 
                in insideLinks.Elements("inside_link")
                select (string)insideLink.Element("title");

(edited per comment)

Chris Vig
  • 8,552
  • 2
  • 27
  • 35
  • 5
    +1 for L2XML. Would suggest casting to string instead of .Value to avoid null issues: (string)insideLink.Element("description") – dahlbyk Oct 11 '09 at 03:10
  • 1
    Thanks for pointing that out, I didn't know that was possible. (It also led me to a Google search about overloading cast operators, which I ALSO did not know was possible in C# :D) – Chris Vig Oct 11 '09 at 03:23
  • 1
    Glad to help! Not enough libraries provide smart casts so people don't think to use them, but XElement definitely does it right (string, value and nullable types). – dahlbyk Oct 11 '09 at 03:32
5

To extend the LINQ to XML suggestion, you can use a select clause to create objects to represent the parsed links:

XDocument doc = XDocument.Load(filename);
var links = from link in doc.Descendants("inside_link")
            select new
            {
                Description = (string)link.Element("description"),
                Url = HttpUtility.UrlDecode((string)link.Element("url"))
            };

foreach(var l in links)
    Console.WriteLine("<a href=\"{0}\">{1}</a>", l.Url, l.Description);

In this case, links will be a sequence of objects that have an anonymous type with Description and Url properties, with Url decoded. This foreach would show something like this:

<a href="http://www.jason.org/Public/News/News.aspx">News</a>
<a href="http://www.jason.org/public/registration/registration.aspx">register</a>
...
dahlbyk
  • 75,175
  • 8
  • 100
  • 122
  • thank you so much @dahlbyk, but there is an error and i have absolutely no idea what they mean (ive never done anything with linq or xml before)... can you please help me figure out what these errors mean? It says "HttpUtility does not exist in the current context." please help... +1 – jay_t55 Oct 11 '09 at 03:55
  • HttpUtility lives in System.Web - at the top of your file make sure you have: using System.Web; – dahlbyk Oct 11 '09 at 05:02
  • i actually did that, but still same problem... – jay_t55 Oct 11 '09 at 05:17
  • 1
    You need to add a reference, System.Web.dll, to your project. – Chansik Im Oct 11 '09 at 05:43
  • yay! you did it! :D thanks lots and stuff Chansik Im :D:D:D:D:D ..very appreciated.. i was a little confused as to why you need to manually add a reference even after typing system.web, but a different question i found on s/o answered that for me. – jay_t55 Oct 12 '09 at 04:53
2

try this:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("yourfile.xml");
foreach (XmlNode result in xmlDoc.SelectNodes("/results/result"))
{
    string title = result.SelectSingleNode("title").InnerText;
    string url = result.SelectSingleNode("url").InnerText;
    foreach (XmlNode insideLink in result.SelectNodes("inside_links/inside_link"))
    {
        string description = insideLink.SelectSingleNode("description").InnerText;
    }
}
Rubens Farias
  • 57,174
  • 8
  • 131
  • 162