4

I use vim and gVim on Windows, and vim in a virtual Linux box for programming. Often I need to change gettext catalog files. However, the support for Unicode characters seems to be incomplete in both Windows versions, perhaps because of the character set.

For example, umlauts (äöü) work just fine; but typographical quotes („“ or “ˮ) and some other characters like mdash and ellipse don't (they do in the Linux box). Vim complains about conversion errors and opens the files in read-only mode; when overriding this and storing anyway, those characters are broken.

Other Windows programs support those characters just fine, e.g. TortoiseSVN.

Note that this is not a “wrong encodingˮ matter, like latin-1 instead of utf-8, as this would affect the umlauts as well. I'm aware of the settings enc, fenc, fencs, and of :e ++enc=utf-8.

:version tells me: version 7.3, MS Windows 32 bit GUI version w/OLE support, including corrections 1-46; +multi_byte_ime/dyn.

Update: Updating to Vim 7.4 didn't solve the problem. +multi_byte_ime/dyn and, since the options are listed more readably now: +digraphs, -xfontset, -postscript (I don't know whether or not they are of interest).

Since I work on the same files with Linux Vim (still 7.3, including corrections 1-547) and, via Samba, the now updated Windows gVim, I tried the following: I opened a catalog file with the Linux version, which used to do the funny quote chars alright (:set enc? fenc?encoding=utf-8, fileencoding=utf-8) and saved it to be Latin-1 (:set fenc=latin-1 (File is marked changed), updated the markers, :w). I got a conversion error; however, some changes had been written.

When re-opening the file with the same Linux version, I got correct umlauts, encoding=utf-8 / fileencoding=latin-1, but incorrect quote chars.

Tobias
  • 2,481
  • 3
  • 26
  • 38
  • 1
    What's your `'encoding'` setting? With `utf-8` (and the file in the same format), there should be no reason for convertion and therefore no such errors. – Ingo Karkat Feb 27 '14 at 10:41
  • My files are saved `utf-8`-encoded, especially the gettext catalogs (`.po`). For these, the encoding is contained in a "Content-Type" header (and used for compilation to `.mo` files, I expect) as well. – Tobias Feb 28 '14 at 08:51

2 Answers2

8

After reading the question How to view UTF-8 Characters in VIM or Gvim, I tried several guifont-settings (:set guifont? yielded nothing), and indeed some of them feature typographical quotes.

The following guifont settings worked for me on my Windows 8.1 system:

  • Lucida_Console
  • DejaVu_Sans_Mono
  • Courier_New
  • Consolas

For guifontwide I found

  • MS_Mincho

to work for chinese characters.

Community
  • 1
  • 1
Tobias
  • 2,481
  • 3
  • 26
  • 38
  • As the author of this answer I'd like to make clear that I didn't add the `guifontwide` and `MS_Mincho` information; I don't have any experience with multi-byte character sets. This should have been a (likely useful, for a certain audience) comment, not an edit. – Tobias Jul 01 '16 at 07:10
  • More generally, you can try fonts on your own system with the methods described at http://vim.wikia.com/wiki/Setting_the_font_in_the_GUI. – Eric O. Lebigot Apr 26 '18 at 13:43
1

The only way I have been able to consistently change file-encoding to UTF-8 on windows is by using notepad++ or powershell (see below). Regardless of the VIM version changing the file encoding from within VIM is gives inconsistent results at best.

Once the file has UTF-8 encoding set outside of vim there are no further issues. File encoding set through vim on linux or mac are respected by windows.

In this thread of powershell command is suggested to change the encoding. That is the fastest way I know of to set a project to UTF-8 on windowws and work without further hassle.

In your examples above, not that there is a difference between :set encoding=utf-8 (which is temporary and only in your display) and set fileencoding=utf-8 which will change the file on save.

FvD
  • 3,697
  • 1
  • 35
  • 53
  • As I tried to make clear from the beginning: This is **not** about using `utf-8` instead of `latin-1`. This is about `utf-8` working for umlauts, but not for _(only slightly)_ more funny characters like quotes, dashes and ellipses. – Tobias Feb 28 '14 at 08:33