Can linq to xml be used for any node based file?

Question

Can I use LINQ to XML in any type of node based text file (e.g. *.xml, *.html, *.txt or whatever the extension of the file may be, they all are node based)?

If not, what is the best alternative for it. Below are few things that I want to do with my files.

Get data from the nodes and use it in other files or in the same file.
Modify node contents/attributes
Relocate or remove or add nodes/data.

HTML is **not** XML. And what do you mean with txt-files are node based? — Manfred Radlwimmer, Sep 26 '17 at 12:21
I meant that the files can have different extensions and not necessarily *.xml all the time but they are like an xml file just with different extension. — Bumba, Sep 26 '17 at 12:26
Then I would remove that from the question. The way it is now it looks like you are looking for a way to parse xml, html and text files with LINQ to XML. — Manfred Radlwimmer, Sep 26 '17 at 12:28

Flater · Answer 1 · 2017-09-26T12:32:08.920

HTML, while it resembles XML, is not actual XML.

This answer lists counterexamples why calid HTML can be invalid XML. A shortened summary:

Some closing tags can be omitted.
<script> escape magic
Attributes without values (boolean attributes)
Attributes without quotes
Implicit open elements and multiple top level elements.

If any of these things are found in your HTML file, then it is valid HTML but invalid XML. Which means that you cannot parse this HTML as if it were XML.

(e.g. *.xml, *.html, *.txt or whatever the extension of the file may be, they all are node based)

You're correct when you say that the file extension has no bearing on something being considered correct XML. Only the contents of the file are relevant.

A file extension is relatively meaningless, at least from a technical perspective. The only functional value of a file extension is that it allows Windows to identify what application it should use (by default) when you try to open the file.
As far as your code is concerned, the file extension (or lack thereof) is irrelevant (other than needing it as part of the filepath, to find the file on disk, of course).

If not, what is the best alternative for it. Below are few things that I want to do with my files.

Suggesting resources is out of scope for Stack Overflow. Google is your friend, look for libraries that help you parse HTML.

Can linq to xml be used for any node based file?

1 Answers1