1

I am reading a TXT file in my Angular application. This TXT file contains some characters in UTF-8, such as "Prénom annuaire".

When I read this file and print data in console it displays as "Pr�nom annuaire". I am bit confused of this behavior, Below is my code to read the data:

let reader = new FileReader();
reader.onloadend = function (){}
reader.readAsText(evt.target.files[0], "UTF-8");

Can anyone help, How can I read data in UTF-8.

Edit: 1

When I compare strings in my application, "Pr�nom annuaire" and "Prénom annuaire" shows as two different strings.

Edit: 2

Problem i am facing because of this behavior,

I have a JSON object fetched from server as below:

{"First_Name": "Prénom annuaire",...}

Then, I read the TXT file as mentioned above, and each Text I read from the file I need to fetch the respective Key from the above JSON. So when I read the 'Prénom annuaire' from the TXT file, I should compare in the above JSON for this value and It should return the "First_Name" back to me.

But When I compare 'Prénom annuaire' text read from the file with the JSON, I am unable to get respective KEY i.e. 'First_Name'. So When I debug, the Text which read from TEXT file shows as 'Pr�nom annuaire' and couldn't match to the 'Prénom annuaire'.

So, how could I do this comparison?

pratRockss
  • 455
  • 8
  • 25
  • 1
    You say `it writes as "Prénom annuaire"`. What are you using to _look_ at the file? Because that actually seems _correct_; that's what "Prénom annuaire" looks like in UTF-8... if you're viewing that UTF-8 as ANSI. It seems like the issue is not in the reading, but in in whatever you are _displaying_ the string with. – Nyerguds Feb 25 '19 at 10:24
  • I am using excel to view the file. If want to view as "Prénom annuaire" how can I do? Also why in console it printed as "Pr�nom annuaire" – pratRockss Feb 25 '19 at 10:29
  • 1
    If it's a csv, open it specifically as UTF-8 file (or use something like Notepad++ to check the stuff in it). I think Excel only detects UTF-16 automatically. As for the console... do you mean the Windows console? I'm not sure if it can handle non-ansi at all. – Nyerguds Feb 25 '19 at 10:47
  • Not the windows console, I mean to say the chrome console or any browser console. – pratRockss Feb 25 '19 at 10:49
  • I opened the file as encoding UTF-8, now it shows correct. But when I read the file with content 'Prénom annuaire' , I got value as "Pr�nom annuaire". Then comparison of these two 'Prénom annuaire' and "Pr�nom annuaire" shows both as different strings. – pratRockss Feb 25 '19 at 10:57
  • It's still entirely possible the output console is simply the thing that doesn't support UTF-8. If you manage to read it correctly, and display it correctly _in your actual program_, there should not be a problem, no matter what your debugging results show you. And if it doesn't show correctly in your program, you probably have to browse around in the Angular settings to ensure the page character set is UTF-8. – Nyerguds Feb 26 '19 at 11:15
  • @Nyerguds I have updated the 'Edit: 2' with the problem I am facing. Could you help with that ? – pratRockss Feb 27 '19 at 06:32
  • Json is always UTF-8. You just need to make sure everything is read in its intended text encoding so the internal objects end up the same. – Nyerguds Feb 27 '19 at 08:06
  • Yes, How to read TEXT file so that both should be same, it's what basically my question. – pratRockss Feb 27 '19 at 10:08
  • Are you sure your text file is UTF-8, though? You should check that with an application like Notepad++ or Editpad Pro, and/or a hex editor. Because, if you read as UTF-8 and get "�" this usually indicates a byte sequence that is illegal for an UTF-8 character. – Nyerguds Feb 27 '19 at 11:08
  • When I open the text file, i can see the text as 'Prénom annuaire'. It means the file is UTF-8? – pratRockss Feb 27 '19 at 12:01
  • No. The only thing that means is that the program you use has correctly _detected_ the text encoding. If it's Notepad++ or Editpad Pro, [you should see it at the bottom](https://i.stack.imgur.com/uQCGf.png). Since you don't seem to really understand what all this text encoding stuff means, I advise you to read [this article](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/). – Nyerguds Feb 27 '19 at 12:56
  • I can see the file encoding as ANSI. I have another file with same content but with encoding as UTF-8, which works perfectly fine for me. But when I open this file to edit in Excel, with file origin set as UTF-8 and update the file, the encoding of the file changes to ANSI which creates problem in my application. It's really confusing for me. could you help me out here? – pratRockss Feb 28 '19 at 06:52
  • You can look into detecting encoding... UTF-8 files often start with a 'byte order mark' that indicates that the file is UTF-8, and even if it doesn't, UTF-8 [follows strict rules in its byte sequences](https://stackoverflow.com/a/33681871/395685), and that can be used to see when something is _not_ UTF-8. I've only done that in c#, though, and in c# there's a much simpler method since a `UTF8Encoding` object can specifically be initialised to throw an exception when decoding fails, which makes it ridiculously easy to detect... I have no idea if a similar easy trick exists for your case. – Nyerguds Mar 01 '19 at 07:39

0 Answers0