4

Say I use Windows 7 with code page 950 (Big5, Traditional Chinese), I want to manipulate some files mixed with unicode name such as 简体中文文件.txt (GB2312, Simplified Chinese) with svn.

If I use chcp 950, when I run:

svn add .\简体中文文件.txt

I get an error:

svn: warning: W155010: 'D:\path\to\work-dir\?体中文文件.txt'
not found
svn: E200009: Could not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

If I use chcp 65001 (UTF-8), I get an even worse error:

svn: warning: W155010: 'D:\path\to\work-dir\?体svn: E200009: C
ould not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

I'd like to try chcp 1200 (UCS-LE) but it says:

Invalid code page

It seems that TortoiseSVN can manipulate those files correctly. However I need to write scripts calling svn to run several automated jobs. Is there any solution available?

Danny Lin
  • 2,050
  • 1
  • 20
  • 34
  • Perhaps subversion's `--encoding` option will be helpful? – Harry Johnston Oct 07 '14 at 03:47
  • Is there a detailed demo or documentation about this? I always get a `Subcommand 'add' doesn't accept option '--encoding ARG'` when I attempt to call `svn add --encoding utf8 .\简体中文文件.txt` or `svn --encoding utf8 add .\简体中文文件.txt`... – Danny Lin Oct 07 '14 at 04:01
  • OK, so I guess that option isn't relevant. There's a chance that the file names are in effect being interpreted as UTF-8 anyway; are you sure you are passing the command-line arguments as UTF-8 strings? I don't think you can do that from the console directly, you'll need to use a batch file. The [bug tracker](http://subversion.tigris.org/issues/show_bug.cgi?id=1537) says that Unicode filenames should work. – Harry Johnston Oct 07 '14 at 04:19
  • Actually I think I see why that wouldn't work; either the batch processor or CreateProcessA would treat the UTF-8 string as being in the current code page and convert it to UTF-16, then the C runtime would convert it to ANSI, and the UTF-8 won't survive that. There's an outside chance it would work if you widen UTF-8 to 16 bits without converting it and call CreateProcessW - but since it turns out that the fix for the file access hasn't actually made it to the release version yet, that won't help you right now. – Harry Johnston Oct 07 '14 at 19:24
  • Are you using the TortoiseSVN command-line interface or a different distribution? – Harry Johnston Oct 07 '14 at 19:37

2 Answers2

2

Programs like svn that use the MS implementation of the C standard library's file IO functions cannot read command input or file names containing characters outside the current code page. You would have to chcp to a suitable code page for each file separately (eg 936 for Chinese).

In theory code page 65001 could cover every character, but unfortunately the MS C runtime has serious bugs that usually break applications when this code page is in use. Microsoft's ongoing failure to fix this long-standing problem leaves UTF-8 a second-class citizen under Windows.

In the future it looks like https://issues.apache.org/jira/browse/SVN-1537?issueNumber=1537 should fix the problem by using direct Win32 APIs instead of C stdlib to do console writes, though I can't see where the related code change is to confirm whether console input and file access are similarly addressed.

Gabriel Devillers
  • 3,155
  • 2
  • 30
  • 53
bobince
  • 528,062
  • 107
  • 651
  • 834
0

First solution: have a look at switching Windows to UTF-8: What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do? It made svn diff provide a correct output on my machine (chcp 65001 was apparently not enough).

Second solution: use svn within WSL.

Gabriel Devillers
  • 3,155
  • 2
  • 30
  • 53