0

I'm trying to store text I read from a file to a string variable:

using HtmlAgilityPack;

// .
// .
// some other code
// .
// .

// The following line's output is as expected. The contents of the file is printed to the console. 
Console.Write( File.ReadAllText( parentFolder + @"\" + file ) );

// (storing the text in a variable)
node.InnerHtml = File.ReadAllText( parentFolder + @"\" + file );

// The output of the following line is different. The spaces and new lines become ="" (equal symbol + 2 sets of quotation marks + a space)
Console.Write( node.InnerHtml );


// example output of Console.Write( File.ReadAllText( parentFolder + @"\" + file ) );
// 'use strict';
//
// module.exports = somevariable;

// example output of Console.Write( node.InnerHtml );
// 'use="" strict';="" module.exports="somevariable;

What could be causing this? And how can it be fixed?

Xel
  • 540
  • 2
  • 8
  • 27
  • 1
    Well, HTML doesn't understand newlines in the sense that a text file uses. A newline in HTML is `
    `. My guess is that HmtlAgilityPack is removing whitespace because it's meaningless in HTML.
    – ProgrammingLlama Sep 08 '21 at 05:47
  • https://stackoverflow.com/questions/2965497/what-is-the-difference-between-file-readalllines-and-file-readalltext – Beltway Sep 08 '21 at 05:48
  • 1
    It doesn't look like your text is actually HTML. Note that you're *not* storing it in a string variable as per your question - you're storing it in the `HtmlNode.InnerHtml` property, which performs HTML-oriented transformations such as collapsing insignificant whitespace. – Jon Skeet Sep 08 '21 at 05:48
  • 1
    @Beltway That...isn't related to this question at all. – ProgrammingLlama Sep 08 '21 at 05:49
  • @Llama Using `readAllLines` and manually inserting linebreaks partially resolves the issue. – Beltway Sep 08 '21 at 05:52
  • @Beltway How would you partially insert linebreaks into Javascript? – ProgrammingLlama Sep 08 '21 at 05:52
  • Please review [MCVE] guidelines on posting code - in particular it is very unclear why you must use `ReadAllText` in the sample code posted in the question. You should be able to debug your code, figure out what is the content of the file and [edit] the question with inline string literal that demonstrates the problem. – Alexei Levenkov Sep 08 '21 at 05:55
  • @LlamaNot The partially refers to resolve, as in this does not solve spaces becoming an empty string. – Beltway Sep 08 '21 at 05:56

1 Answers1

1

Your issue here is that newlines (in the sense of \n or \r\n), and whitespace in general, have little meaning when it comes to HTML as they aren't rendered by browsers as any more than a single space. So <div>a b</div> would be rendered the same as <div>a b</div>, etc. It seems that HtmlAgilityPack is simply tidying up the "HTML" (actually Javascript code) that you give it.

If I understand you correctly, it seems that you want to apply some code to a tag in the HTML (e.g. a script tag). To do so, we need to treat the code as text and construct a text node:

string script = File.ReadAllText( parentFolder + @"\" + file );
HtmlTextNode textNode = doc.CreateTextNode(script);

We can then append that as a child to the node in question:

node.AppendChild(textNode);

This will retain the newlines in your text file as we're not falsly declaring that it's HTML.

Try it online

P.S. If there is existing text within the node, you might have to clear that out first. You can do that by calling the following code before .AppendChild(textNode);:

scriptNode.RemoveAllChildren();
ProgrammingLlama
  • 36,677
  • 7
  • 67
  • 86