266

I'm used to using vim to modify a file's line endings:

$ file file
file: ASCII text, with CRLF line terminators
$ vim file
:set ff=mac
:wq
$ file file
file: ASCII text, with CR line terminators

Is it possible to use a similar process to change a file's unicode encoding? I'm trying the following, which doesn't work:

$ file file.xml
file.xml: Unicode text, UTF-16, little-endian
$ vim file
:set encoding=utf-8
:wq
$ file file.xml
file.xml: Unicode text, UTF-16, little-endian

I saw someone say that he could "set fileencoding=utf-8, then update and write the file, and it works," but I seem to be missing something, or else he was confused. I don't know what he meant by "then update."

skiphoppy
  • 97,646
  • 72
  • 174
  • 218

6 Answers6

286

From the doc:

:write ++enc=utf-8 russian.txt

So you should be able to change the encoding as part of the write command.

Johan
  • 3,039
  • 1
  • 20
  • 15
Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
187

Notice that there is a difference between

set encoding

and

set fileencoding

In the first case, you'll change the output encoding that is shown in the terminal. In the second case, you'll change the output encoding of the file that is written.

Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
Johan
  • 3,039
  • 1
  • 20
  • 15
  • 1
    thank you! Apache was outputting utf-8, so was php, so the browser said, so vim said with `set encoding`, and still the pages showed mangled characters that were alright as iso-8859-1. using `set fileencoding` showed a pretty 'Latin1' – Adriano Varoli Piazza Mar 08 '10 at 18:29
86

While using vim to do it is perfectly possible, why don't you simply use iconv? I mean - loading text editor just to do encoding conversion seems like using too big hammer for too small nail.

Just:

iconv -f utf-16 -t utf-8 file.xml > file.utf8.xml

And you're done.

60

Just like your steps, setting fileencoding should work. However, I'd like to add one "set bomb" to help editor consider the file as UTF8.

$ vim file
:set bomb
:set fileencoding=utf-8
:wq
Francis
  • 11,388
  • 2
  • 33
  • 37
  • 8
    Thanks for your answer, it led me to learn more about the UTF byte order mark. However FYI, setting a BOM seems unnecessary/inadvisable for UTF-8 since it's not a fixed byte-length format like 16 or 32. See [here](http://vim.wikia.com/wiki/Working_with_Unicode) for an explanation and reference. It's not a problem (and even helpful) for vim, I just thought people should just be aware that it may cause compatibility issues elsewhere. – joelhardi Jun 01 '11 at 19:22
  • 2
    Is it `bomb` or `bom`, and can it be `unset`? **EDIT**: Yes, you can remove it via `set nobomb`. – icedwater Jul 01 '14 at 02:33
  • 7
    Yes, VIm set up us the `bomb` (with a b). – ruffin Oct 16 '14 at 14:57
  • 1
    per the docs, `:set bomb` is turned on if `:set fenc=utf-8`.. see `:he bomb` – Evan Carroll Dec 04 '14 at 22:15
  • 13
    all our base encoding are now belong to UTF-8 – roblogic Aug 25 '15 at 01:49
8

It could be useful to change the encoding just on the command line before the file is read:

rem On MicroSoft Windows
vim --cmd "set encoding=utf-8" file.ext
# In *nix shell
vim --cmd 'set encoding=utf-8' file.ext

See starting, --cmd.

Hans Ginzel
  • 8,192
  • 3
  • 24
  • 22
  • 4
    The first variation should also work on *nix shells. `'single quotes'` are only needed to escape all meta characters, which is usually not what you want. – jpaugh Feb 06 '17 at 15:36
0

auto GUIEnter * set encoding=utf-8 should help