154

What's the best way to get the contents of the mixed body element in the code below? The element might contain either XHTML or text, but I just want its contents in string form. The XmlElement type has the InnerXml property which is exactly what I'm after.

The code as written almost does what I want, but includes the surrounding <body>...</body> element, which I don't want.

XDocument doc = XDocument.Load(new StreamReader(s));
var templates = from t in doc.Descendants("template")
                where t.Attribute("name").Value == templateName
                select new
                {
                   Subject = t.Element("subject").Value,
                   Body = t.Element("body").ToString()
                };
Amirhossein Mehrvarzi
  • 18,024
  • 7
  • 45
  • 70
Mike Powell
  • 5,914
  • 4
  • 28
  • 28

14 Answers14

213

I wanted to see which of these suggested solutions performed best, so I ran some comparative tests. Out of interest, I also compared the LINQ methods to the plain old System.Xml method suggested by Greg. The variation was interesting and not what I expected, with the slowest methods being more than 3 times slower than the fastest.

The results ordered by fastest to slowest:

  1. CreateReader - Instance Hunter (0.113 seconds)
  2. Plain old System.Xml - Greg Hurlman (0.134 seconds)
  3. Aggregate with string concatenation - Mike Powell (0.324 seconds)
  4. StringBuilder - Vin (0.333 seconds)
  5. String.Join on array - Terry (0.360 seconds)
  6. String.Concat on array - Marcin Kosieradzki (0.364)

Method

I used a single XML document with 20 identical nodes (called 'hint'):

<hint>
  <strong>Thinking of using a fake address?</strong>
  <br />
  Please don't. If we can't verify your address we might just
  have to reject your application.
</hint>

The numbers shown as seconds above are the result of extracting the "inner XML" of the 20 nodes, 1000 times in a row, and taking the average (mean) of 5 runs. I didn't include the time it took to load and parse the XML into an XmlDocument (for the System.Xml method) or XDocument (for all the others).

The LINQ algorithms I used were: (C# - all take an XElement "parent" and return the inner XML string)

CreateReader:

var reader = parent.CreateReader();
reader.MoveToContent();

return reader.ReadInnerXml();

Aggregate with string concatenation:

return parent.Nodes().Aggregate("", (b, node) => b += node.ToString());

StringBuilder:

StringBuilder sb = new StringBuilder();

foreach(var node in parent.Nodes()) {
    sb.Append(node.ToString());
}

return sb.ToString();

String.Join on array:

return String.Join("", parent.Nodes().Select(x => x.ToString()).ToArray());

String.Concat on array:

return String.Concat(parent.Nodes().Select(x => x.ToString()).ToArray());

I haven't shown the "Plain old System.Xml" algorithm here as it's just calling .InnerXml on nodes.


Conclusion

If performance is important (e.g. lots of XML, parsed frequently), I'd use Daniel's CreateReader method every time. If you're just doing a few queries, you might want to use Mike's more concise Aggregate method.

If you're using XML on large elements with lots of nodes (maybe 100's), you'd probably start to see the benefit of using StringBuilder over the Aggregate method, but not over CreateReader. I don't think the Join and Concat methods would ever be more efficient in these conditions because of the penalty of converting a large list to a large array (even obvious here with smaller lists).

Markus Safar
  • 6,324
  • 5
  • 28
  • 44
  • StringBuilder version can be written on one line: var result = parent.Elements().Aggregate(new StringBuilder(), (sb, xelem) => sb.AppendLine(xelem.ToString()), sb => sb.ToString()) – Softlion Sep 23 '11 at 14:27
  • 8
    You missed `parent.CreateNavigator().InnerXml` (need `using System.Xml.XPath` for the extension method). – Richard Jul 23 '12 at 17:23
  • I wouldn't have thought you need the `.ToArray()` inside `.Concat`, but it seems to make it faster – drzaus Jan 14 '14 at 18:02
  • In case you don't scroll to the bottom of these answers: consider just stripping the container/root from `.ToString()` per [this answer](http://stackoverflow.com/a/21642095/1037948). Seems even faster... – drzaus Jun 04 '14 at 17:55
  • 2
    You should really wrap that `var reader = parent.CreateReader();` in a using statement. – BrainSlugs83 Mar 18 '15 at 22:04
  • Seconding @Richard 's comment. parent.CreateNavigator().InnerXml is particularly nice for projecting as it's inline. – ccook Sep 28 '16 at 20:54
72

I think this is a much better method (in VB, shouldn't be hard to translate):

Given an XElement x:

Dim xReader = x.CreateReader
xReader.MoveToContent
xReader.ReadInnerXml
Instance Hunter
  • 7,837
  • 5
  • 44
  • 56
  • 1
    Nice! This is a lot faster than some of the other methods proposed (I tested them all - see my answer for details). Although all of them do the job, this one does it the fastest - even seens faster than System.Xml.Node.InnerXml itself! –  Nov 09 '09 at 23:12
  • 4
    XmlReader is disposable, so don't forget to wrap it with using, please (I'd edit the answer myself if I knew VB). – Dmitry Fedorkov Nov 25 '13 at 12:25
22

How about using this "extension" method on XElement? worked for me !

public static string InnerXml(this XElement element)
{
    StringBuilder innerXml = new StringBuilder();

    foreach (XNode node in element.Nodes())
    {
        // append node's xml string to innerXml
        innerXml.Append(node.ToString());
    }

    return innerXml.ToString();
}

OR use a little bit of Linq

public static string InnerXml(this XElement element)
{
    StringBuilder innerXml = new StringBuilder();
    doc.Nodes().ToList().ForEach( node => innerXml.Append(node.ToString()));

    return innerXml.ToString();
}

Note: The code above has to use element.Nodes() as opposed to element.Elements(). Very important thing to remember the difference between the two. element.Nodes() gives you everything like XText, XAttribute etc, but XElement only an Element.

Markus Safar
  • 6,324
  • 5
  • 28
  • 44
Vin
  • 6,115
  • 4
  • 41
  • 55
17

With all due credit to those who discovered and proved the best approach (thanks!), here it is wrapped up in an extension method:

public static string InnerXml(this XNode node) {
    using (var reader = node.CreateReader()) {
        reader.MoveToContent();
        return reader.ReadInnerXml();
    }
}
Todd Menier
  • 37,557
  • 17
  • 150
  • 173
11

Keep it simple and efficient:

String.Concat(node.Nodes().Select(x => x.ToString()).ToArray())
  • Aggregate is memory and performance inefficient when concatenating strings
  • Using Join("", sth) is using two times bigger string array than Concat... And looks quite strange in code.
  • Using += looks very odd, but apparently is not much worse than using '+' - probably would be optimized to the same code, becase assignment result is unused and might be safely removed by compiler.
  • StringBuilder is so imperative - and everybody knows that unnecessary "state" sucks.
7

I ended up using this:

Body = t.Element("body").Nodes().Aggregate("", (b, node) => b += node.ToString());
Jeff Atwood
  • 63,320
  • 48
  • 150
  • 153
Mike Powell
  • 5,914
  • 4
  • 28
  • 28
  • That will do a lot of string concatenation - I'd prefer Vin's use of StringBuilder myself. The manual foreach is not a negative. – Marc Gravell Dec 06 '08 at 22:33
  • This method really saved me today, trying to write out an XElement with the new constructor and none of the other methods were lending themselves to it handily, while this one did. Thanks! – delliottg Aug 13 '14 at 20:42
3

Personally, I ended up writing an InnerXml extension method using the Aggregate method:

public static string InnerXml(this XElement thiz)
{
   return thiz.Nodes().Aggregate( string.Empty, ( element, node ) => element += node.ToString() );
}

My client code is then just as terse as it would be with the old System.Xml namespace:

var innerXml = myXElement.InnerXml();
Martin R-L
  • 4,039
  • 3
  • 28
  • 28
2

@Greg: It appears you've edited your answer to be a completely different answer. To which my answer is yes, I could do this using System.Xml but was hoping to get my feet wet with LINQ to XML.

I'll leave my original reply below in case anyone else wonders why I can't just use the XElement's .Value property to get what I need:

@Greg: The Value property concatenates all the text contents of any child nodes. So if the body element contains only text it works, but if it contains XHTML I get all the text concatenated together but none of the tags.

Mike Powell
  • 5,914
  • 4
  • 28
  • 28
  • I ran into this exact same issue and thought it was a bug: I had 'mixed' content (i.e. `random text child child`) which became `random text childchild` via `XElement.Parse(...).Value` – drzaus Jun 04 '14 at 17:45
1

// using Regex might be faster to simply trim the begin and end element tag

var content = element.ToString();
var matchBegin = Regex.Match(content, @"<.+?>");
content = content.Substring(matchBegin.Index + matchBegin.Length);          
var matchEnd = Regex.Match(content, @"</.+?>", RegexOptions.RightToLeft);
content = content.Substring(0, matchEnd.Index);
user950851
  • 77
  • 4
  • 1
    neat. even faster to just use `IndexOf`: `var xml = root.ToString(); var begin = xml.IndexOf('>')+1; var end = xml.LastIndexOf('<'); return xml.Substring(begin, end-begin);` – drzaus Jun 04 '14 at 17:53
1

doc.ToString() or doc.ToString(SaveOptions) does the work. See http://msdn.microsoft.com/en-us/library/system.xml.linq.xelement.tostring(v=vs.110).aspx

user1920925
  • 672
  • 6
  • 5
  • 1
    No, it does not. It also includes the element with all its attributes. Only the content between the start and the end tag is wanted. – Christoph Jun 22 '19 at 23:51
0

Is it possible to use the System.Xml namespace objects to get the job done here instead of using LINQ? As you already mentioned, XmlNode.InnerXml is exactly what you need.

Greg Hurlman
  • 17,666
  • 6
  • 54
  • 86
0
var innerXmlAsText= XElement.Parse(xmlContent)
                    .Descendants()
                    .Where(n => n.Name.LocalName == "template")
                    .Elements()
                    .Single()
                    .ToString();

Will do the job for you

Vinod Srivastav
  • 3,644
  • 1
  • 27
  • 40
0

Wondering if (notice I got rid of the b+= and just have b+)

t.Element( "body" ).Nodes()
 .Aggregate( "", ( b, node ) => b + node.ToString() );

might be slightly less efficient than

string.Join( "", t.Element.Nodes()
                  .Select( n => n.ToString() ).ToArray() );

Not 100% sure...but glancing at Aggregate() and string.Join() in Reflector...I think I read it as Aggregate just appending a returning value, so essentially you get:

string = string + string

versus string.Join, it has some mention in there of FastStringAllocation or something, which makes me thing the folks at Microsoft might have put some extra performance boost in there. Of course my .ToArray() call my negate that, but I just wanted to offer up another suggestion.

-2
public static string InnerXml(this XElement xElement)
{
    //remove start tag
    string innerXml = xElement.ToString().Trim().Replace(string.Format("<{0}>", xElement.Name), "");
    ////remove end tag
    innerXml = innerXml.Trim().Replace(string.Format("</{0}>", xElement.Name), "");
    return innerXml.Trim();
}