1

Context:

I work with many translated languages as part of my job (Send file to translation firm and then receive the translated versions), so this also applies to other languages as well.

  1. I am the original author of the HTML files returned.
  2. My HTML files are created with UTF-8 (VScode settings is read & write UTF-8).
  3. The HTML files received are UTF-8.
  4. VSCode's GuessedContent format offers UTF-8.
  5. UTF-8 format reads "�".
  6. Windows-1252 format reads "�". [Updated] Tried that since it is one of the solutions I found online.
  7. That same, untouched, file displays correctly on Notepad.
  8. Copy-pasting the files content from Notepad to VSCode works.
  9. The issue happens on a sometimes-basis when opening files received from an other firm.
  10. Encoding default is already set as UTF-8 for reading and writing.

What I tried so far:

-> Manually changing the format of the file and reopening it from UTF-8 (received format) to Windows-1252 (french) and vice-vera. The issue persist as the error remains coded in their respective format.

-> Install the french language pack to see if it would help with at least the french projects, but still no help. French is not an issue as I work in a french speaking environment. I can type it myself.

-> The translation firm we work with says that they see the files correctly on their end.

-> My colleagues who uses other programming software (like Sublime Text) do not have this issue

-> Decoding the error text does not keep in memory which error is what (when Notepad can)

-> This post (1) mentions that VSCode and Identity Services Engine services (CISCO probably)(2) aren't the most compatible, and it is quite probable that my workplace and/or the firm uses these services. Maybe it could lie here?

(1) https://github.com/PowerShell/vscode-powershell/issues/1680#issuecomment-453200280

(2) https://www.cisco.com/site/ca/en/products/security/identity-services-engine/index.html

What can be done?

Images for support

  1. Code as seen in Notepad
    Code as seen in Notepad

  1. Code viewed from Sharepoint (where the file is downloaded from)
    Code opened from our Sharepoint (where I downloaded the file)

  1. That same code displayed in VSCode
    That same file open on VSCode

  1. It is currently UTF-8
    Format of file upon opening the file is UTF-8

  1. How VSCode ends up displaying the code
    How the content is displayed through VSCode

  1. Typing over the file with the problematic letters appears correctly.
    Typing over the file with the problematic letters appears correctly

  1. Codepoints do not recognise the different errors even though Notepad does. (left: file read by Notepad, middle: said file uploaded onto onlinetools.com, right: same error code for all 4)
    enter image description here

  1. Comparing the error code to the supposed code
    enter image description here

  • It sounds like all you want to do is get a UTF-8 file, written in French, to display properly in the MS VSCode "Editor" window(s). Try this: https://stackoverflow.com/a/40365121/421195. This might also help: `File > Preferences > Settings > Encoding > Choose your option` – paulsm4 May 24 '23 at 22:48
  • @user I'm trying any and every solution I can find. I am incidentally learning about all these file formats trying to solve this. Some answers had suggest bringing it back to Windows-1252, which I tried. I brought it up in case someone suggests it. – Marie-Elaine May 24 '23 at 23:01
  • @paulsm4, the Encoding setting is already set to UTF-8 for reading and writing. – Marie-Elaine May 24 '23 at 23:04
  • 1
    @user VSCode is the only one that does not read the UTF-8 file as the UTF-8 file it is. And it's not an overall issue as not all the files I open have an issue. File A will always have this issue, but file B won't. I want VSCode to not have this issue, and I would like to know how this can be fixed. I added images to the original post. – Marie-Elaine May 24 '23 at 23:28
  • @user that is sadly not an issue as typing on the file with the problematic / correct letters appears as they should. – Marie-Elaine May 24 '23 at 23:39
  • can you [edit] your post to show us what you see when you paste each of those "problematic characters" into something like https://onlineunicodetools.com/convert-unicode-to-code-points ? I'm curious how exactly those characters are being encoded, since I'm pretty sure there are multiple ways of doing some of them in unicode. One way is combining accent characters, and another is using dedicated code points. show also how the one that _you_ type is encoded. – starball May 25 '23 at 00:09
  • also, I suggest you take a look at the google search results from "[`github vscode issues encoding accent characters`](https://www.google.com/search?q=github+vscode+issues+encoding+accent+characters+%EF%BF%BD&oq=github+vscode+issues+encoding+accent+characters+%EF%BF%BD)" and see if anything there solves your problem. Ex. https://github.com/PowerShell/vscode-powershell/issues/1680, https://github.com/Microsoft/vscode/issues/50197, https://github.com/microsoft/vscode/issues/18274, https://github.com/microsoft/vscode/issues/5388, https://github.com/Microsoft/vscode/issues/1838 – starball May 25 '23 at 00:15
  • 1
    marie, can you please also include text in addition to the images you have shown of text? see [this](https://meta.stackoverflow.com/a/285557/11107541) for why we like having text and not just images. You can keep the images you have shown because the highlighting in the images is useful, but please add the text as well. – starball May 25 '23 at 21:36
  • @user Thank you very much for the help. I figured out how to display titles for the images, so it should be easier to read now – Marie-Elaine May 25 '23 at 22:07
  • @user Uploading the file on the unicode converter does not seem to recognise what's behind the error. Seems like Notepad is the one doing some sort of magic. Looking through your suggestions for possible solutions in your link either covers the basics of converting UTF-8, or cross compatibility issues with extensions I do not use. However, one comment mentioned that VScode and the use of an ISE for securely transfering files do not go well together. ISE is a service that I am most certain could be in use in either or both ends of the file transfers. I'm hoping not, but it could be that. – Marie-Elaine May 25 '23 at 22:13
  • I manually created a snippet of your example in Notepad (`Quels types de systèmes d'exploitation`) and saved it as UTF-8 file "tmp.html". I then opened it in VSCode: the editor window displayed it fine, and it said "UTF-8" in the status bar at the bottom. I then imported the file into CodePoints (https://onlinetools.com/unicode/convert-unicode-to-code-points). My codepoint for the accent grave "e" is `1111001`. Your codepoint shows `1111111111111101` (approximately: didn't count all the "1s"). SUGGESTION: Compare a "good" file with a "bad" file in a hex editor. – paulsm4 May 26 '23 at 02:32

0 Answers0