1

Hi I have an Xml document, read from file using

var doc = XDocument.Load(reader);

The xml looks like this,

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE document SYSTEM 'xmlschemas/some.dtd'>
<document xmlns='http://www.abcd.com/dxl' version='9.0' someversion='1.0' 
          replicaid='0xxxxDB' form='Test'>
   <item name='From'>
      <text>John Doe</text>
   </item>
   <item name='SentTo'>
      <text>Another John Doe</text>
   </item>
   <item name='ModTime'>
      <datetime dst='true'>20180129T114649,22-02</datetime>
   </item>
   <item name='Body' sign='true' seal='true'>
       <attachmentref name='some.pdf' displayname='some.pdf'>
           <picture height='34px' width='342px'>
                <notesbitmap>
                    lQAmAAAAAAAAAAAAAAABAAAAAAAAAFYBI
                </notesbitmap>
           </picture>

How do I parse such an xml using Linq while targeting 'item' tags having specific name attributes? Tried this with no success.,

 doc.Descendants("document")
 .Where(item =>
 {
    string cus = (string)item.Element("item");
    return cus != null && cus == "name";
 })
 .Descendants("SentTo")
 .Select(d => d.Value)
 .ToList();

I want to target item tags with name attributes 'From' and 'SentTo', there are other tags which I may not want to target. Thanks in advance.

Uzair Khan
  • 2,812
  • 7
  • 30
  • 48

4 Answers4

4

Part of the problem is that you're looking for elements without a namespace, but your document does specify a default namespace. Fortunately, LINQ to XML makes namespace handling easy.

You're also using Descendants to try to find the value of an attribute (I believe) which isn't how that works.

Here's an example that does work - assuming that your aim was to get the <text> content from each <item> element with a name attribute of SentTo:

using System;
using System.Linq;
using System.Xml.Linq;

class Test
{
    static void Main()
    {
        var doc = XDocument.Load("test.xml");
        XNamespace ns = "http://www.abcd.com/dxl";
        var sentToValues = doc.Root
            .Elements(ns + "item")
            .Where(item => (string) item.Attribute("name") == "SentTo")
            .Select(item => (string) item.Element(ns + "text"))
            .ToList();
        foreach (var value in sentToValues)
        {
            Console.WriteLine(value);
        }
    }
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Is there a way to target multiple tags in a single Linq query, like From, SentTo and SentDate etc all simultaneously? – Uzair Khan Aug 31 '18 at 10:56
  • @MichaelPhilips: Those aren't "tags" (elements) - they're attribute values. You could certainly have `Where(item => interestingNames.Contains((string) item.Attribute("name"))` where `interestingNames` is a list or set of From, SentTo and SentDate, for example. Alternatively, you could parse all items into a Name/Value dictionary, or something similar. There are lots of different approaches here. – Jon Skeet Aug 31 '18 at 11:12
2

You do not want to use HtmlAgilityPack for XML parsing. However, if you see why it might be a problem, but then still accept this decision, then you can do something like:

var relevantItems = doc.DocumentNode
    .Descendants("item")
    .Select(x => new
    {
        Item = x,
        ItemName = x.Attributes.Contains("name") ? x.Attributes["name"].Value : null
    })
    .Where(x => x.ItemName == "From" || x.ItemName == "SentTo")
    .Select(x => x.Item)
    .ToList();
Dmitry Korolev
  • 675
  • 4
  • 20
  • Has the question changed? I don't see anything about the HtmlAgilityPack here. Perhaps I'm missing something? – Jon Skeet Aug 31 '18 at 10:48
1

Your xml contains namespace. So you have to read that namespace too in your code

XDocument doc = XDocument.Load(@"XMLFile1.xml");
XNamespace ns = doc.Root.GetDefaultNamespace();

var text = doc.Descendants(ns + "item")
              .Single(c => c.Attribute("name").Value == "SentTo")
              .Elements(ns + "text")
              .Select(item => (string)item)
              .FirstOrDefault();

Output:

enter image description here

So, if you want to take more than one name attribute text node then

//This is sample string of array
string[] strArray = new string[2] { "From", "SentTo" };

var list = doc.Descendants(ns + "item")
              .Where(c => strArray == null || strArray.Any(x => x.Contains(c.Attribute("name").Value)))
              .Elements(ns + "text")
              .Select(item => (string)item)
              .ToList();

Output:

enter image description here

er-sho
  • 9,581
  • 2
  • 13
  • 26
0

I would probably make class Item with properties and attributes that I need and deserialize XML into it, later on, I would use LINQ to filter values. Take a look here:

How to Deserialize XML document

http://www.janholinka.net/Blog/Article/11

I hope this helps.

Badza
  • 21
  • 2