3

I'm stuck on a subtle problem. I try to build a C# 4.0 console application to read an XML file with.

The XML file is as follow:

<?xml version="1.0" encoding="UTF-8"?>
<?xml:stylesheet type='text/xsl' href='report.xsl' version='1.0'?>
...
<logs>
...
</logs>

And this is my code:

...
var root = XDocument.Load(xmlStream);

IEnumerable<XElement> address =
    from el in root.Descendants("formated-text")
    select el;
...

This gives me the following error at the Load method:

The ':' character, hexadecimal value 0x3A, cannot be included in a name. Line 2, position 6.

Changing the colon on the second line to a '-' solves the error ... duh

What can I do in my code to read the source XML without having to replace that 'stupid' colon first?

Thank you!

DeepCore
  • 61
  • 5
  • Does that actually work when loaded into a browser that automatically handles PIs for stylesheets I know browsers tend towards tolerating errors, but that seems like a case where they'd be better just putting up an error message. – Jon Hanna Dec 14 '11 at 14:41
  • Yes, in a browser it does work with the colon ... sadly :o( – DeepCore Dec 14 '11 at 15:56
  • 1
    XML gave us a chance to start over with nice clear error messages, that was only partly taken :( – Jon Hanna Dec 14 '11 at 16:03

5 Answers5

3

It looks to me like you simply have an invalid XML document. The colon should be a hyphen (as per W3C). I doubt that you'll be able to make LINQ to XML parse an invalid document - and you shouldn't try. You should fix the document instead.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
3

The colon is wrong, you should be using the dash

See http://www.w3.org/Style/styling-XML.en.html

Mark Dickinson
  • 6,573
  • 4
  • 29
  • 41
2

Nothing. That "stupid colon" is simply invalid at that position.

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
  • Thank you to all of you for taking time to provide some usefull clues at what was/is going on with these XML elements. I needed this confirmation that the creator of the source XML did a mistake with that colon. I already have implemented a plan B, until I can convince a really big department (not mine) to make the change in their application ... :o( Plan B is to read the XML file first with a stream reader and replace all 'xml:' occurences. Then feed this corrected file into my process. – DeepCore Dec 14 '11 at 16:06
  • Been there, done that :-/ It's never just the code change, either, it's the change request, authorization, release schedules... – dash Dec 14 '11 at 16:24
1

You XSL-Stylesheet element is incorrect.

It should be:

<?xml-stylesheet type='text/xsl' href='report.xsl' version='1.0'?>

Try validating your XML against any number of online validators.

You can try loading the XML as a string and fixing this issue using string parsing, or you could read the original file line by line and fix any occurences of xml:stylsheet before saving it like the text file in this example, but it would be better to get whomever created the XML to fix it at source.

Community
  • 1
  • 1
dash
  • 89,546
  • 4
  • 51
  • 71
  • That's what I've also thought, that this is an invalid element. And nevertheless, I've seen a lot of these on the internet :o( It will be like David vs Goliath to get this change implemented at the source of these XML files... I have just been assigned with a task to automate some data extraction out of the XMLs – DeepCore Dec 14 '11 at 15:57
  • If you are convinced you are going to be seeing a lot of these, and you can't change them, then the best thing you can do is clean them up first - hence the suggestions of manipulating the contents of the file as a string (yucky) or as a file line-by-line (more efficient and you can bail at the first occurence of your xml after all of the processing instructions) – dash Dec 14 '11 at 16:00
  • Fun fact. Since there's no `text/xsl` but there is an `application/xslt+xml` the "correct" way follows one standard's rules, but breaks another, but hardly anything will accept the correct `application/xslt+xml` :( – Jon Hanna Dec 14 '11 at 16:11
  • That was an XSLT 2.0 standardization. By then, everyone had moved onto something other than XSLT, which is a shame as I like it and we wrote some good software powered by XSLT. It does go someway towards explaining the inconsistent implementation though. – dash Dec 14 '11 at 16:26
0

I found out that the origin of these 'malformed' XML files dates back to the mid 1990's ... yes, such an old system is today still in use and still produces this output. I can live with a workaround in my code.

Thank you for taking time to provide some usefull clues at what was/is going on with these XML elements.

I needed this confirmation that the creator of the source XML did a mistake with that colon.

I already have implemented a plan B, until I can convince a really big department (not mine) to make the change in their application ... :o(

Plan B is to read the XML file first and replace all 'xml:' occurences. Then feed this corrected file into my process.

DeepCore
  • 61
  • 5