0

I need to read a html file and parse the content to a string

From this

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
    <meta charset="utf-8">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Index</title>
</head>
<body>
    Index
</body>
</html>

To an output like this

$stringValue = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"...

I've tried with $stringValue = $htmlFile | ConvertTo-Json but it transforms some characters into new codes (> = u003e) where I want to keep the special characters intact.

Any help is appreciated

Marcus Höglund
  • 16,172
  • 11
  • 47
  • 69

2 Answers2

3

You can use below command to get the content of html file and that you can store in any string variable like below.

[string]$Datas = Get-Content [HTML_file_Location]
Vincent K
  • 1,326
  • 12
  • 19
Deepesh
  • 590
  • 6
  • 8
0

Try to read it as UTF-16 and see if output is passed through as desired. This answer shows how to read it as UTF-16.

Reading a "string in little-endian UTF-16 encoding" with BinaryReader

MwBakker
  • 498
  • 5
  • 18