Better way to detect XML?

Question

Currently, I have the following c# code to extract a value out of text. If its XML, I want the value within it - otherwise, if its not XML, it can just return the text itself.

String data = "..."
try
{
    return XElement.Parse(data).Value;
}
catch (System.Xml.XmlException)
{
    return data;
}

I know exceptions are expensive in C#, so I was wondering if there was a better way to determine if the text I'm dealing with is xml or not?

I thought of regex testing, but I dont' see that as a cheaper alternative. Note, I'm asking for a less expensive method of doing this.

Exceptions are free I throw them away all the time. There is nothing wrong with your code above unless you prove there is, it is really just a code smell. Has anybody tested the methods below are actually faster, and is that speed required? — JustEngland, Nov 13 '11 at 03:32
@JustEngland actually, in most C++ implementations, exceptions are slow. But C# might be a different case. I haven't used C#, so I can't comment on exception performance in C#. I can in C++ loop 400 million iterations per second, but with exception thrown every iteration, it's less than million iterations per second. — juhist, Jul 09 '17 at 11:59
Wow what a thread, I dont even code c# anymore :) The best advice, I would give today is, but I would compare all the different parsers in the framework. You might also be able to cheat with a some basic checks. There are also a 3rd party xml parsers available that have better performance. — JustEngland, Jul 20 '17 at 18:31
Even when performance is not an issue, it's better to avoid throwing an exception in non-exceptional circumstances. Our tooling is built to find exceptions and they can get in the way when debugging something else. I think this is an example of a "[vexing exception](https://blogs.msdn.microsoft.com/ericlippert/2008/09/10/vexing-exceptions/)", though less severe than the int.Parse example Eric gave. — eisenpony, Nov 13 '18 at 19:24

score 19 · Accepted Answer · edited Oct 15 '12 at 18:55

19

You could do a preliminary check for a < since all XML has to start with one and the bulk of all non-XML will not start with one.

(Free-hand written.)

// Has to have length to be XML
if (!string.IsNullOrEmpty(data))
{
    // If it starts with a < after trimming then it probably is XML
    // Need to do an empty check again in case the string is all white space.
    var trimmedData = data.TrimStart();
    if (string.IsNullOrEmpty(trimmedData))
    {
       return data;
    }

    if (trimmedData[0] == '<')
    {
        try
        {
            return XElement.Parse(data).Value;
        }
        catch (System.Xml.XmlException)
        {
            return data;
        }
    }
}
else
{
    return data;
}

I originally had the use of a regex but Trim()[0] is identical to what that regex would do.

edited Oct 15 '12 at 18:55

yoozer8

7,361
7
58
93

answered May 21 '09 at 03:11

Colin Burnett

11,150
6
31
40

3

+1 for the concept since it'll weed out 99% of exceptions, but I don't feel regex is required here. StartsWith or IndexOf would be fine and faster. – annakata May 21 '09 at 08:03
1

Um, StartsWith won't work since whitespace is permitted and IndexOf would require knowing everything before the index be whitespace. Though IndexOf *could* be used and I'll modify my answer for that. – Colin Burnett May 21 '09 at 13:09

Rashmi Pandit · Answer 2 · 2009-05-21T07:58:20.737

The code given below will match all the following xml formats:

<text />                             
<text/>                              
<text   />                           
<text>xml data1</text>               
<text attr='2'>data2</text>");
<text attr='2' attr='4' >data3 </text>
<text attr>data4</text>              
<text attr1 attr2>data5</text>

And here's the code:

public class XmlExpresssion
{
    // EXPLANATION OF EXPRESSION
    // <        :   \<{1}
    // text     :   (?<xmlTag>\w+)  : xmlTag is a backreference so that the start and end tags match
    // >        :   >{1}
    // xml data :   (?<data>.*)     : data is a backreference used for the regex to return the element data      
    // </       :   <{1}/{1}
    // text     :   \k<xmlTag>
    // >        :   >{1}
    // (\w|\W)* :   Matches attributes if any

    // Sample match and pattern egs
    // Just to show how I incrementally made the patterns so that the final pattern is well-understood
    // <text>data</text>
    // @"^\<{1}(?<xmlTag>\w+)\>{1}.*\<{1}/{1}\k<xmlTag>\>{1}$";

    //<text />
    // @"^\<{1}(?<xmlTag>\w+)\s*/{1}\>{1}$";

    //<text>data</text> or <text />
    // @"^\<{1}(?<xmlTag>\w+)((\>{1}.*\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";

    //<text>data</text> or <text /> or <text attr='2'>xml data</text> or <text attr='2' attr2 >data</text>
    // @"^\<{1}(?<xmlTag>\w+)(((\w|\W)*\>{1}(?<data>.*)\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";

    private const string XML_PATTERN = @"^\<{1}(?<xmlTag>\w+)(((\w|\W)*\>{1}(?<data>.*)\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";

    // Checks if the string is in xml format
    private static bool IsXml(string value)
    {
        return Regex.IsMatch(value, XML_PATTERN);
    }

    /// <summary>
    /// Assigns the element value to result if the string is xml
    /// </summary>
    /// <returns>true if success, false otherwise</returns>
    public static bool TryParse(string s, out string result)
    {
        if (XmlExpresssion.IsXml(s))
        {
            Regex r = new Regex(XML_PATTERN, RegexOptions.Compiled);
            result = r.Match(s).Result("${data}");
            return true;
        }
        else
        {
            result = null;
            return false;
        }
    }

}

Calling code:

if (!XmlExpresssion.TryParse(s, out result)) 
    result = s;
Console.WriteLine(result);

I'm somewhat suspicious of this because XML is not a regular language and thus you cannot parse XML with regex: https://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la ...but, for identifying XML, perhaps it could work if you're not doing a full parse. — juhist, Jul 09 '17 at 11:53

cyberconte · Answer 3 · 2009-05-21T18:06:21.150

Update: (original post is below) Colin has the brilliant idea of moving the regex instantiation outside of the calls so that they're created only once. Heres the new program:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace ConsoleApplication3
{
    delegate String xmltestFunc(String data);

    class Program
    {
        static readonly int iterations = 1000000;

        private static void benchmark(xmltestFunc func, String data, String expectedResult)
        {
            if (!func(data).Equals(expectedResult))
            {
                Console.WriteLine(data + ": fail");
                return;
            }
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < iterations; ++i)
                func(data);
            sw.Stop();
            Console.WriteLine(data + ": " + (float)((float)sw.ElapsedMilliseconds / 1000));
        }

        static void Main(string[] args)
        {
            benchmark(xmltest1, "<tag>base</tag>", "base");
            benchmark(xmltest1, " <tag>base</tag> ", "base");
            benchmark(xmltest1, "base", "base");
            benchmark(xmltest2, "<tag>ColinBurnett</tag>", "ColinBurnett");
            benchmark(xmltest2, " <tag>ColinBurnett</tag> ", "ColinBurnett");
            benchmark(xmltest2, "ColinBurnett", "ColinBurnett");
            benchmark(xmltest3, "<tag>Si</tag>", "Si");
            benchmark(xmltest3, " <tag>Si</tag> ", "Si" );
            benchmark(xmltest3, "Si", "Si");
            benchmark(xmltest4, "<tag>RashmiPandit</tag>", "RashmiPandit");
            benchmark(xmltest4, " <tag>RashmiPandit</tag> ", "RashmiPandit");
            benchmark(xmltest4, "RashmiPandit", "RashmiPandit");
            benchmark(xmltest5, "<tag>Custom</tag>", "Custom");
            benchmark(xmltest5, " <tag>Custom</tag> ", "Custom");
            benchmark(xmltest5, "Custom", "Custom");

            // "press any key to continue"
            Console.WriteLine("Done.");
            Console.ReadLine();
        }

        public static String xmltest1(String data)
        {
            try
            {
                return XElement.Parse(data).Value;
            }
            catch (System.Xml.XmlException)
            {
                return data;
            }
        }

        static Regex xmltest2regex = new Regex("^[ \t\r\n]*<");
        public static String xmltest2(String data)
        {
            // Has to have length to be XML
            if (!string.IsNullOrEmpty(data))
            {
                // If it starts with a < then it probably is XML
                // But also cover the case where there is indeterminate whitespace before the <
                if (data[0] == '<' || xmltest2regex.Match(data).Success)
                {
                    try
                    {
                        return XElement.Parse(data).Value;
                    }
                    catch (System.Xml.XmlException)
                    {
                        return data;
                    }
                }
            }
           return data;
        }

        static Regex xmltest3regex = new Regex(@"<(?<tag>\w*)>(?<text>.*)</\k<tag>>");
        public static String xmltest3(String data)
        {
            Match m = xmltest3regex.Match(data);
            if (m.Success)
            {
                GroupCollection gc = m.Groups;
                if (gc.Count > 0)
                {
                    return gc["text"].Value;
                }
            }
            return data;
        }

        public static String xmltest4(String data)
        {
            String result;
            if (!XmlExpresssion.TryParse(data, out result))
                result = data;

            return result;
        }

        static Regex xmltest5regex = new Regex("^[ \t\r\n]*<");
        public static String xmltest5(String data)
        {
            // Has to have length to be XML
            if (!string.IsNullOrEmpty(data))
            {
                // If it starts with a < then it probably is XML
                // But also cover the case where there is indeterminate whitespace before the <
                if (data[0] == '<' || data.Trim()[0] == '<' || xmltest5regex.Match(data).Success)
                {
                    try
                    {
                        return XElement.Parse(data).Value;
                    }
                    catch (System.Xml.XmlException)
                    {
                        return data;
                    }
                }
            }
            return data;
        }
    }

    public class XmlExpresssion
    {
        // EXPLANATION OF EXPRESSION
        // <        :   \<{1}
        // text     :   (?<xmlTag>\w+)  : xmlTag is a backreference so that the start and end tags match
        // >        :   >{1}
        // xml data :   (?<data>.*)     : data is a backreference used for the regex to return the element data      
        // </       :   <{1}/{1}
        // text     :   \k<xmlTag>
        // >        :   >{1}
        // (\w|\W)* :   Matches attributes if any

        // Sample match and pattern egs
        // Just to show how I incrementally made the patterns so that the final pattern is well-understood
        // <text>data</text>
        // @"^\<{1}(?<xmlTag>\w+)\>{1}.*\<{1}/{1}\k<xmlTag>\>{1}$";

        //<text />
        // @"^\<{1}(?<xmlTag>\w+)\s*/{1}\>{1}$";

        //<text>data</text> or <text />
        // @"^\<{1}(?<xmlTag>\w+)((\>{1}.*\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";

        //<text>data</text> or <text /> or <text attr='2'>xml data</text> or <text attr='2' attr2 >data</text>
        // @"^\<{1}(?<xmlTag>\w+)(((\w|\W)*\>{1}(?<data>.*)\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";

        private static string XML_PATTERN = @"^\<{1}(?<xmlTag>\w+)(((\w|\W)*\>{1}(?<data>.*)\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";
        private static Regex regex = new Regex(XML_PATTERN, RegexOptions.Compiled);

        // Checks if the string is in xml format
        private static bool IsXml(string value)
        {
            return regex.IsMatch(value);
        }

        /// <summary>
        /// Assigns the element value to result if the string is xml
        /// </summary>
        /// <returns>true if success, false otherwise</returns>
        public static bool TryParse(string s, out string result)
        {
            if (XmlExpresssion.IsXml(s))
            {
                result = regex.Match(s).Result("${data}");
                return true;
            }
            else
            {
                result = null;
                return false;
            }
        }

    }


}

And here are the new results:

<tag>base</tag>: 3.667
 <tag>base</tag> : 3.707
base: 40.737
<tag>ColinBurnett</tag>: 3.707
 <tag>ColinBurnett</tag> : 4.784
ColinBurnett: 0.413
<tag>Si</tag>: 2.016
 <tag>Si</tag> : 2.141
Si: 0.087
<tag>RashmiPandit</tag>: 12.305
 <tag>RashmiPandit</tag> : fail
RashmiPandit: 0.131
<tag>Custom</tag>: 3.761
 <tag>Custom</tag> : 3.866
Custom: 0.329
Done.

There you have it. Precompiled regex are the way to go, and pretty efficient to boot.

(original post)

I cobbled together the following program to benchmark the code samples that were provided for this answer, to demonstrate the reasoning for my post as well as evaluate the speed of the privded answers.

Without further ado, heres the program.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace ConsoleApplication3
{
    delegate String xmltestFunc(String data);

    class Program
    {
        static readonly int iterations = 1000000;

        private static void benchmark(xmltestFunc func, String data, String expectedResult)
        {
            if (!func(data).Equals(expectedResult))
            {
                Console.WriteLine(data + ": fail");
                return;
            }
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < iterations; ++i)
                func(data);
            sw.Stop();
            Console.WriteLine(data + ": " + (float)((float)sw.ElapsedMilliseconds / 1000));
        }

        static void Main(string[] args)
        {
            benchmark(xmltest1, "<tag>base</tag>", "base");
            benchmark(xmltest1, " <tag>base</tag> ", "base");
            benchmark(xmltest1, "base", "base");
            benchmark(xmltest2, "<tag>ColinBurnett</tag>", "ColinBurnett");
            benchmark(xmltest2, " <tag>ColinBurnett</tag> ", "ColinBurnett");
            benchmark(xmltest2, "ColinBurnett", "ColinBurnett");
            benchmark(xmltest3, "<tag>Si</tag>", "Si");
            benchmark(xmltest3, " <tag>Si</tag> ", "Si" );
            benchmark(xmltest3, "Si", "Si");
            benchmark(xmltest4, "<tag>RashmiPandit</tag>", "RashmiPandit");
            benchmark(xmltest4, " <tag>RashmiPandit</tag> ", "RashmiPandit");
            benchmark(xmltest4, "RashmiPandit", "RashmiPandit");

            // "press any key to continue"
            Console.WriteLine("Done.");
            Console.ReadLine();
        }

        public static String xmltest1(String data)
        {
            try
            {
                return XElement.Parse(data).Value;
            }
            catch (System.Xml.XmlException)
            {
                return data;
            }
        }

        public static String xmltest2(String data)
        {
            // Has to have length to be XML
            if (!string.IsNullOrEmpty(data))
            {
                // If it starts with a < then it probably is XML
                // But also cover the case where there is indeterminate whitespace before the <
                if (data[0] == '<' || new Regex("^[ \t\r\n]*<").Match(data).Success)
                {
                    try
                    {
                        return XElement.Parse(data).Value;
                    }
                    catch (System.Xml.XmlException)
                    {
                        return data;
                    }
                }
            }
           return data;
        }

        public static String xmltest3(String data)
        {
            Regex regex = new Regex(@"<(?<tag>\w*)>(?<text>.*)</\k<tag>>");
            Match m = regex.Match(data);
            if (m.Success)
            {
                GroupCollection gc = m.Groups;
                if (gc.Count > 0)
                {
                    return gc["text"].Value;
                }
            }
            return data;
        }

        public static String xmltest4(String data)
        {
            String result;
            if (!XmlExpresssion.TryParse(data, out result))
                result = data;

            return result;
        }

    }

    public class XmlExpresssion
    {
        // EXPLANATION OF EXPRESSION
        // <        :   \<{1}
        // text     :   (?<xmlTag>\w+)  : xmlTag is a backreference so that the start and end tags match
        // >        :   >{1}
        // xml data :   (?<data>.*)     : data is a backreference used for the regex to return the element data      
        // </       :   <{1}/{1}
        // text     :   \k<xmlTag>
        // >        :   >{1}
        // (\w|\W)* :   Matches attributes if any

        // Sample match and pattern egs
        // Just to show how I incrementally made the patterns so that the final pattern is well-understood
        // <text>data</text>
        // @"^\<{1}(?<xmlTag>\w+)\>{1}.*\<{1}/{1}\k<xmlTag>\>{1}$";

        //<text />
        // @"^\<{1}(?<xmlTag>\w+)\s*/{1}\>{1}$";

        //<text>data</text> or <text />
        // @"^\<{1}(?<xmlTag>\w+)((\>{1}.*\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";

        //<text>data</text> or <text /> or <text attr='2'>xml data</text> or <text attr='2' attr2 >data</text>
        // @"^\<{1}(?<xmlTag>\w+)(((\w|\W)*\>{1}(?<data>.*)\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";

        private const string XML_PATTERN = @"^\<{1}(?<xmlTag>\w+)(((\w|\W)*\>{1}(?<data>.*)\<{1}/{1}\k<xmlTag>)|(\s*/{1}))\>{1}$";

        // Checks if the string is in xml format
        private static bool IsXml(string value)
        {
            return Regex.IsMatch(value, XML_PATTERN);
        }

        /// <summary>
        /// Assigns the element value to result if the string is xml
        /// </summary>
        /// <returns>true if success, false otherwise</returns>
        public static bool TryParse(string s, out string result)
        {
            if (XmlExpresssion.IsXml(s))
            {
                Regex r = new Regex(XML_PATTERN, RegexOptions.Compiled);
                result = r.Match(s).Result("${data}");
                return true;
            }
            else
            {
                result = null;
                return false;
            }
        }

    }


}

And here are the results. Each one was executed 1 million times.

<tag>base</tag>: 3.531
 <tag>base</tag> : 3.624
base: 41.422
<tag>ColinBurnett</tag>: 3.622
 <tag>ColinBurnett</tag> : 16.467
ColinBurnett: 7.995
<tag>Si</tag>: 19.014
 <tag>Si</tag> : 19.201
Si: 15.567

Test 4 took too long, as 30 minutes later it was deemed too slow. To demonstrate how much slower it was, here is the same test only run 1000 times.

<tag>base</tag>: 0.004
 <tag>base</tag> : 0.004
base: 0.047
<tag>ColinBurnett</tag>: 0.003
 <tag>ColinBurnett</tag> : 0.016
ColinBurnett: 0.008
<tag>Si</tag>: 0.021
 <tag>Si</tag> : 0.017
Si: 0.014
<tag>RashmiPandit</tag>: 3.456
 <tag>RashmiPandit</tag> : fail
RashmiPandit: 0
Done.

Extrapolating out to a million executions, it would've taken 3456 seconds, or just over 57 minutes.

This is a good example as to why complex regex are a bad idea if you're looking for efficient code. However it showed that simple regex can still be good answer in some cases - i.e. the small 'pre-test' of xml in colinBurnett answer created a potentially more expensive base case, (regex was created in case 2) but also a much shorter else case by avoiding the exception.

Try it again with the Regex instances created only once by using a static field to hold them. I would venture that a non-trivial portion of the time is in instantiating and compiling the regexes repeatedly. — Colin Burnett, May 21 '09 at 17:38
Actually, the regex in my example is entirely unnecessary with `if (data[0] == '<' || data.TrimStart()[0] == '<')`. The latter is exactly what "^[ \t\r\n]*<" was looking for. — Colin Burnett, May 21 '09 at 18:03

score 3 · Answer 4 · answered May 21 '09 at 03:12

3

I find that a perfectly acceptable way of handling your situation (it's probably the way I'd handle it as well). I couldn't find any sort of "XElement.TryParse(string)" in MSDN, so the way you have it would work just fine.

answered May 21 '09 at 03:12

Mark Carpenter

17,445
22
96
149

score 3 · Answer 5 · answered May 21 '09 at 08:02

There is no way of validating that the text is XML other than doing something like XElement.Parse. If, for example, the very last close-angle-bracket is missing from the text field then it's not valid XML, and it's very unlikely that you'll spot this with RegEx or text parsing. There are number of illegal characters, illegal sequences etc that RegEx parsiing will most likely miss.

All you can hope to do is to short cut your failure cases.

So, if you expect to see lots of non-XML data and the less-expected case is XML then employing RegEx or substring searches to detect angle brackets might save you a little bit of time, but I'd suggest this is only useful if you're batch processing lots of data in a tight loop.

If, instead, this is parsing user entered data from a web form or a winforms app then I think paying the cost of the Exception might be better than spending the dev and test effort ensuring that your short-cut code doesn't generate false positive/negative results.

It's not clear where you're getting your XML from (file, stream, textbox or somewhere else) but remember that whitespace, comments, byte order marks and other stuff can get in the way of simple rules like "it must start with a <".

score 2 · Answer 6 · edited May 23 '17 at 11:58

As noted by @JustEngland in the comment exceptions are not that expensive, a debugger intercepting them them might take time but in normally they are well performing and good practice. See How expensive are exceptions in C#?.

A better way would be to roll your own TryParse style function:

[System.Diagnostics.DebuggerNonUserCode]
static class MyXElement
{
    public static bool TryParse(string data, out XElement result)
    {
        try
        {
            result = XElement.Parse(data);
            return true;
        }
        catch (System.Xml.XmlException)
        {
            result = default(XElement);
            return false;
        }
    }
}

The DebuggerNonUserCode attribute makes the debugger skip the caught exception to streamline your debugging experience.

Used like this:

    static void Main()
    {
        var addressList = "line one~line two~line three~postcode";

        var address = new XElement("Address");
        var addressHtml = "<span>" + addressList.Replace("~", "<br />") + "</span>";

        XElement content;
        if (MyXElement.TryParse(addressHtml, out content))
            address.ReplaceAll(content);
        else
            address.SetValue(addressHtml);

        Console.WriteLine(address.ToString());
        Console.ReadKey();
    }
}

I'd have preferred to create an extension method for the TryParse, but you can't create a static one called on a type rather than an instance.

It would be much nicer to have a TryParse in the framework, but this is a useful trick to clean up the debugging experience. — eisenpony, Nov 13 '18 at 19:25

score 1 · Answer 7 · answered May 21 '09 at 03:23

1

Why is regex expensive? Doesn't it kill 2 birds with 1 stone (match and parse)?

Simple example parsing all elements, even easier if it's just one element!

Regex regex = new Regex(@"<(?<tag>\w*)>(?<text>.*)</\k<tag>>");
MatchCollection matches = regex.Matches(data);
foreach (Match match in matches)
{
    GroupCollection groups = match.Groups;
    string name = groups["tag"].Value;
    string value = groups["text"].Value;
    ...
}

answered May 21 '09 at 03:23

si618

16,580
12
67
84

1

Just to note, that doesn't verify that the xml is valid (it could be invalid in the text portion) – cyberconte May 21 '09 at 03:44
1

It also doesn't determine that all tags have been closed (making it invalid XML) – Richard Szalay Sep 06 '11 at 08:15

score 0 · Answer 8 · answered Jun 09 '15 at 07:29

I am not exactly sure if your requirement considers the file format and as this question was asked a long time back & i happen to search for a similar thing, i would like you to know what worked for me , so if any one comes here this might help :)

We can use Path.GetExtension(filePath) and check if it is XML then use it other wise do what ever is required

score 0 · Answer 9 · answered Aug 19 '22 at 21:46

This is a pretty old question and answer, but still a valid concern :-)

Here's a slightly more streamlined version of the accepted answer also wrapped into a custom extension for easy use with any string:

public static bool IsDuckTypedXml(this string xmlText)
{
    if (string.IsNullOrWhiteSpace(xmlText))
        return false;

    var text = xmlText.Trim();
    return (text.StartsWith("<") && text.EndsWith(">"));
}

score 0 · Answer 10 · answered May 21 '09 at 03:12

0

Clue -- all valid xml must start with "<?xml "

You may have to deal with character set differences but checking plain ASCII, utf-8 and unicode will cover 99.5% of the xml out there.

answered May 21 '09 at 03:12

James Anderson

27,109
7
50
78

6

The "" is not required and all standard parsers (C#, PHP, Python, etc.) I've used happily parse without one. – Colin Burnett May 21 '09 at 03:16
1

What Colin said + the OP is very likely to work with XML fragments – annakata May 21 '09 at 08:04

score 0 · Answer 11 · answered May 21 '09 at 04:04

The way you suggest will be expensive if you will use it in a loop where most of the xml s are not valied, In case of valied xml your code will work like there are not exception handling... so if in most of cases your xml is valied or you doesn't use it in loop, your code will work fine

score 0 · Answer 12 · answered May 21 '09 at 07:45

0

If you want to know if it's valid, why not use the built-in .NetFX object rather than write one from scratch?

Hope this helps,

Bill

answered May 21 '09 at 07:45

V'rasana Oannes

644
1
6
16

score 0 · Answer 13 · answered May 21 '09 at 17:58

A variation on Colin Burnett's technique: you could do a simple regex at the beginning to see if the text starts with a tag, then try to parse it. Probably >99% of strings you'll deal with that start with a valid element are XML. That way you could skip the regex processing for full-blown valid XML and also skip the exception-based processing in almost every case.

Something like ^<[^>]+> would probably do the trick.

score -2 · Answer 14 · answered Sep 14 '11 at 15:21

-2

How about this, take your string or object and toss in into a new XDocument or XElement. Everything resolves using ToString().

answered Sep 14 '11 at 15:21

Crazy T-Mack

1

Better way to detect XML?

14 Answers14

Linked