notepad ++ shows ucs-2LE while ubuntu FILE [file] shows UTF-16LE, I am confused?

Question

I am trying to convert the file generated from a mssql to utf-8. When I open the output of he mssql using notepad++ in windows server 2003 recognises the file as UCS-2LE I copied the file to a Ubuntu machine, using file [file] it shows that the encoding is UTF-16LE. Really confused, there must be some difference in encoding, as the names are different. But why do I see this in the same file. Its a .csv file generated from the mssql query.

Old question but I think the answer from benw is correct and should be marked as this - or is there still something "open" for you? — Simon Sobisch, Sep 27 '17 at 07:33

BenW · Accepted Answer · 2012-07-31T08:46:35.093

9

For the most part, UTF-16 and UCS-2 are the same thing. There is no difference.

What it means is that each character is two bytes wide. "LE" stands for little endian, i.e. each two-byte character is stored with the low byte first.

If you want to convert to UTF-8, in Notepad++ click Convert to UTF-8 in the Encoding menu, then save.

If your other programs choke on the file after doing this, or you see two garbage characters at the start of the file, then click Convert to UTF-8 without BOM instead.

edited Jul 31 '12 at 08:46

answered Jul 31 '12 at 08:37

BenW

489
3
8

UTF-16 characters are also 2 bytes wide as far as I know. Why `file [file]` in Ubuntu is showing me Utf-16LE? when I see the list of encodings `iconv -l` recognises I can see both the encodings available. Now I want to know when I convert from this encoding to `utf-8` encoding, what encoding shall I use as the input file encoding? – tough Jul 31 '12 at 08:51
Thanks for the answer and the edit, but I am trying to convert in Ubuntu machine, If you read my explanation carefully, you can see that I need to choose between one of the two encodings, to put it in the input encoding for the command `iconv -f [input encoding] -t [output encoding] [file]`. How would you suggest me in this case? – tough Jul 31 '12 at 09:04
I converted the file using NOTEPAD++ to UTF-8 WITHOUT BOM but later when I open the file It again shows the encoding to be ANSI instead of UTF-8 WITHOUT BOM encoding. – tough Jul 31 '12 at 11:46
Is this still in Notepad++ or is this on the Ubuntu machine? I don't know anything about Ubuntu, but when UTF-8 without BOM is selected, the Notepad++ status bar should report the encoding to be `ANSI as UTF-8`. – BenW Jul 31 '12 at 17:31
Also, like I said, UTF-16 and UCS-2 are pretty much the same thing. Try both and see which one works. – BenW Jul 31 '12 at 17:32
If your file, saved as UTF8 / without BOM contains no special characters, its indistinguible from ASCII / ANSI. Software can only recognize by guessing the contents (or using BOM if present). – Rafael Nobre Dec 18 '12 at 17:48

notepad ++ shows ucs-2LE while ubuntu FILE [file] shows UTF-16LE, I am confused?

1 Answers1

Linked