0

I have a json file in my application which i can read ok and get the string as UTF-8 from the read.

using (StreamReader file = File.OpenText(filePath))
 {
     string json = file.ReadToEndAsync().Result;
 }

I have some special characters in the form of RegEx which include /:@~{+_&%$

This is fine. All working.

However, there is also instances of the £ sign inside the json file contained in a RegEx. When the json file is read using the code about (UTF-8 by default ), the £ character comes out and is shown in the string as a black diamond with a white question mark in the middle; as a result, some conditions fail due to the RegEx not being correct.

The reason for this is the encoding and that UTF-8 cant understand this because it should ( according to my knowledge ) be read using ISO-8859-1 format.

Now, when I change my code to read the JSON file using this standard

using (StreamReader file = new StreamReader(entityFilePath, Encoding.GetEncoding("iso-8859-1")))

I get the correct value of £ out in my string within the RegEx.

However if I ever want to use other Unicode values such as ÁÉÍÓÚáéíóú in my json file, reading it using ISO-8859-1 will cause them to be retrieved and interpreted incorrectly.

My question is, how do I safely and reliably read my json file to retrieve all the text intact and all the characters intact including the £ sign?

Kind regards

Kev
  • 743
  • 2
  • 14
  • 32
  • 2
    The queston is, which encoding was chosen when the file was created? – Lasse V. Karlsen Oct 05 '16 at 16:50
  • Why not use this : StreamReader file = new StreamReader(filePath, Encoding.UTF8); – jdweng Oct 05 '16 at 16:54
  • 1
    Either the file was written with no consistent encoding (i.e., it's broken) or you don't know the encoding with which it was written. Try cycling through all encodings with [`Encoding.GetEncodings().Select(i => i.GetEncoding())`](https://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings(v=vs.110).aspx) to see if there is one that works for all characters. – dbc Oct 05 '16 at 17:03
  • Hi Lasse V, I created the json file within visual studio 2015 by just adding a new item -> Json file. I actually never gave the encoding much though to be honest assuming that because the ReadToAsync method was UTF-8 and that the new file would also be the same .... I might try the suggestion on here : http://stackoverflow.com/questions/18627694/how-to-insert-a-symbol-pound-euro-copyright-into-a-textbox – Kev Oct 05 '16 at 19:05
  • 1
    If you created the file yourself, you might want to make sure that Visual Studio creates files in UTF-8. To do so, see [How to set standard encoding in Visual Studio](https://stackoverflow.com/questions/696627) or [Save all files in Visual Studio project as UTF-8](https://stackoverflow.com/questions/279673). – dbc Oct 05 '16 at 19:20
  • Note that [rfc7159](https://tools.ietf.org/html/rfc7159) states: *JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.* – roeland Oct 05 '16 at 23:37

1 Answers1

0

Answer found on this post from Timothy Shields:

How to insert a Symbol (Pound, Euro, Copyright) into a Textbox

\u00A3 is the Pound sign, £.

I added the above hex code to the JSON file and all tests passed.

Community
  • 1
  • 1
Kev
  • 743
  • 2
  • 14
  • 32