82

I have a website, and I can send my Turkish characters with jQuery in Firefox, but Internet Explorer doesn't send my Turkish characters. I looked at my source file in notepad, and this file's code page is ANSI.

When I convert it to UTF-8 without BOM and close the file, the file is again ANSI when I reopen.

How can I convert my file from ANSI to UTF-8?

Aurora0001
  • 13,139
  • 5
  • 50
  • 53
Kerem Bekman
  • 1,281
  • 2
  • 13
  • 24
  • 1
    You can use the tool I wrote for that, I also suffered from same problem and made my own way out. https://github.com/srcnalt/ANSI-to-UTF8 – Sarge Feb 03 '14 at 08:33
  • I agree that – in a strict sense – this question is not [on-topic](https://stackoverflow.com/help/on-topic) for Stack Overflow. ~ * ~ But it's very much [ON-topic](https://superuser.com/help/on-topic) for _Super User_. After more than 9 years (!), it _still_ hasn't been migrated to Super User. Such a pity. – Henke Sep 28 '22 at 11:20

3 Answers3

79

Regarding this part:

When I convert it to UTF-8 without bom and close file, the file is again ANSI when I reopen.

The easiest solution is to avoid the problem entirely by properly configuring Notepad++.

Try Settings -> Preferences -> New document -> Encoding -> choose UTF-8 without BOM, and check Apply to opened ANSI files.

notepad++ UTF-8 apply to opened ANSI files

That way all the opened ANSI files will be treated as UTF-8 without BOM.

For explanation what's going on, read the comments below this answer.

To fully learn about Unicode and UTF-8, read this excellent article from Joel Spolsky.

jakub.g
  • 38,512
  • 12
  • 92
  • 130
  • That helped me a lot. Thanks. I do not understand the behavior though. Because I open an existing file and not a new one. – Dr. Manuel Kuehner Aug 08 '14 at 09:34
  • 2
    The `Apply to opened ANSI files` is relevant in your situation: when you have a file that contains only plain ASCII characters (without accents etc.), and you don't have BOM at the beginning of the file, then the editor by default treats it as an ANSI file, because there's nothing in this file to indicate that you might want to treat is an UTF-8 file. However when you add, say, `Ö` and save it as UTF-8 w/o BOM, even though there's no BOM at the beginning of the file, from the presence of two-byte combo behind `Ö` (0xC396 in this case) the editor learns "this has to be UTF-8". – jakub.g Aug 08 '14 at 12:45
  • 2
    In other words, when you save ANSI plain file as UTF-8, the output is identical as you were to save it as ANSI. You have to tell the editor to *treat it* as UTF-8 when you open it. For the file to *be* UTF-8, it either has to start with a BOM, or contain certain two-bytes sequences. The behavior of the editor when you input `Ö` in an ANSI file is configuration dependent. – jakub.g Aug 08 '14 at 12:53
  • Thanks for the elaborate answer. In ma specific case I need ISO-8859-1 encoding. Is there something that I can put at the beginning of a text file so that the editors "see" that? – Dr. Manuel Kuehner Aug 09 '14 at 11:24
  • 1
    AFAIK the only encoding that you can enforce from the editor on read time, by putting certain characters in the file, is UTF-8 with BOM. – jakub.g Apr 25 '16 at 15:14
  • The current version of Notepad++ (7.8.5) does not offer the 'Apply to opened ANSI files' option anymore while saving. Instead I had to use the menu option Encoding > Convert to UTF-8 and then save the file. – Sander de Jong Apr 16 '20 at 14:00
  • 1
    @SanderdeJong works for me with 7.8.5 32-bit - I posted an image – jakub.g Apr 17 '20 at 17:43
46

Maybe this is not the answer you needed, but I encountered similar problem, so I decided to put it here.

I needed to convert 500 xml files to UTF8 via Notepad++. Why Notepad++? When I used the option "Encode in UTF8" (many other converters use the same logic) it messed up all special characters, so I had to use "Convert to UTF8" explicitly.


Here some simple steps to convert multiple files via Notepad++ without messing up with special characters (for ex. diacritical marks).

  1. Run Notepad++ and then open menu Plugins->Plugin Manager->Show Plugin Manager
  2. Install Python Script. When plugin is installed, restart the application.
  3. Choose menu Plugins->Python Script->New script.
  4. Choose its name, and then past the following code:

convertToUTF8.py

import os
import sys
from Npp import notepad # import it first!

filePathSrc="C:\\Users\\" # Path to the folder with files to convert
for root, dirs, files in os.walk(filePathSrc):
    for fn in files: 
        if fn[-4:] == '.xml': # Specify type of the files
            notepad.open(root + "\\" + fn)      
            notepad.runMenuCommand("Encoding", "Convert to UTF-8")
            # notepad.save()
            # if you try to save/replace the file, an annoying confirmation window would popup.
            notepad.saveAs("{}{}".format(fn[:-4], '_utf8.xml')) 
            notepad.close()

After all, run the script

Jun Murakami
  • 815
  • 1
  • 9
  • 17
  • 2
    Great solution. Since I use notepad++ localization I've had to translate 'Encoding' and 'Convert to UTF-8' options, weird. – Piotr Jun 29 '13 at 14:41
  • I wonder how to run the python script? I run it in a command line and it says that notepad cannot be found. – flexwang May 27 '14 at 03:08
  • 2
    Hi flexwang, you should run it from Notepad++ – Jun Murakami May 27 '14 at 08:47
  • I got error meassage because of the Chinese characters. https://www.dropbox.com/s/f2efnzt9cd2i5or/%E5%B1%8F%E5%B9%95%E6%88%AA%E5%9B%BE%202014-05-31%2015.59.03.png – Zhang LongQI May 31 '14 at 08:06
  • 2
    doesnt work anymore :( – Phil Feb 20 '16 at 17:14
  • I updated script, try it now. Also, I returned as a commented line previous ```notepad.save()``` option. It has been reported that it gives an annoying confirmation window, but for me it works silently and changes original files. You may try to interchange these two and see what is the best fit for you. – Jun Murakami Feb 21 '16 at 09:46
  • The plugin for running these scripts seems to be broken in the latest Notepad++. – Mathias Lykkegaard Lorenzen Apr 01 '16 at 20:31
  • I'm using the latest Notepad++ and the problem seems to be that I cannot import the python module os I wrote this script that loops all open files,l converts them and saves them if their full file path does not begin with "new" (this avoids the save dialogs): http://pastebin.com/2jJRC3B0 For some reason I can't post it as an answer here. I find it practical to do it on all open files rather than on a folder, since you can run over the files first and check if they indeed have the right format set. – Jbjstam Sep 30 '16 at 14:42
16

If you don't have non-ASCII characters (codepoints 128 and above) in your file, UTF-8 without BOM is the same as ASCII, byte for byte - so Notepad++ will guess wrong.

What you need to do is to specify the character encoding when serving the AJAX response - e.g. with PHP, you'd do this:

header('Content-Type: application/json; charset=utf-8');

The important part is to specify the charset with every JS response - else IE will fall back to user's system default encoding, which is wrong most of the time.

Piskvor left the building
  • 91,498
  • 46
  • 177
  • 222
  • Why isn't this the accepted answer? It's the only answer that explains what is happening, and what the real solution to the question is. – Máthé Endre-Botond Oct 03 '16 at 11:16
  • _so Notepad++ will guess wrong_ – Maybe just a matter of wording, but I wouldn't say that Notepad++ guesses it _wrong_. I'd rather say that – for a file containing _only_ ASCII characters – encoding as ANSI or UTF-8 is equally correct. And the only way to make Notepad++ choose UTF-8 over ANSI is to make UTF-8 the _default_ encoding (for pure ASCII files). [This answer](https://stackoverflow.com/a/7423450) shows how to do just that. – Henke Sep 28 '22 at 12:44