1

I want to dump (and later work with) the paths of the locally changed files in my SVN repository. Problem is, there are umlauts in some filenames (like ä, ö, ü).

When I open a powershell window in my lokal trunk folder, I can do svn status and get the result with correct umlauts ("ü" in this case):

PS C:\trunk> svn status -q
M       Std\ClientComponents\Prüfung.xaml
M       Std\ClientComponents\Prüfung.xaml.cs
M       Std\ClientComponents\PrüfungViewModel.cs

When I do the same in my powershell script, the results are different.

Script "DumpChangedFiles.ps1":

foreach ( $filename in svn status -q ) 
{
  Write-Host $filename
}

Results:

PS C:\trunk> .\DumpChangedFiles.ps1
M       Std\ClientComponents\Pr³fung.xaml
M       Std\ClientComponents\Pr³fung.xaml.cs
M       Std\ClientComponents\Pr³fungViewModel.cs

Question: Why are the umlauts wrong? How do I get to the correct results?


Hex-Dump:

ef bb bf 4d 20 20 20 20 20 20 20 53 74 64 5c 43 6c 69 65 6e 74 43 6f 6d 70 6f 6e 65 6e 74 73 5c 50 72 c2 b3 66 75 6e 67 2e 78 61 6d 6c 0d 0a 4d 20 20 20 20 20 20 20 53 74 64 5c 43 6c 69 65 6e 74 43 6f 6d 70 6f 6e 65 6e 74 73 5c 50 72 c2 b3 66 75 6e 67 2e 78 61 6d 6c 2e 63 73 0d 0a 4d 20 20 20 20 20 20 20 53 74 64 5c 43 6c 69 65 6e 74 43 6f 6d 70 6f 6e 65 6e 74 73 5c 50 72 c2 b3 66 75 6e 67 56 69 65 77 4d 6f 64 65 6c 2e 63 73

Hex-Dump as Picture


Here's the output of the script DumpChangedFiles.ps1 compared to the output of your desired command:

PS C:\trunk> .\DumpChangedFiles.ps1
M       Std\ClientComponents\Pr³fung.xaml
M       Std\ClientComponents\Pr³fung.xaml.cs
M       Std\ClientComponents\Pr³fungViewModel.cs

PS C:\trunk> $PSDefaultParameterValues['*:Encoding'] = 'utf8'; svn status -q
M       Std\ClientComponents\Prüfung.xaml
M       Std\ClientComponents\Prüfung.xaml.cs
M       Std\ClientComponents\PrüfungViewModel.cs

Output of SVN--version is:

PS C:\trunk> svn  --version
svn, version 1.14.0 (r1876290)
   compiled May 24 2020, 17:07:49 on x86-microsoft-windows

Copyright (C) 2020 The Apache Software Foundation.
This software consists of contributions made by many people;
see the NOTICE file for more information.
Subversion is open source software, see http://subversion.apache.org/

The following repository access (RA) modules are available:

* ra_svn : Module for accessing a repository using the svn network protocol.
  - with Cyrus SASL authentication
  - handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
  - handles 'file' scheme
* ra_serf : Module for accessing a repository via WebDAV protocol using serf.
  - using serf 1.3.9 (compiled with 1.3.9)
  - handles 'http' scheme
  - handles 'https' scheme

The following authentication credential caches are available:

* Wincrypt cache in C:\Users\reichert\AppData\Roaming\Subversion
jreichert
  • 1,466
  • 17
  • 31
  • As you're using `Write-Host` to directly output to the cmd console, you may have beforehand to set the encoding output to utf8 with the command `chcp 65001`. Does this resolve your issue ? – Zilog80 May 06 '21 at 10:21
  • @Zilog80: Nope. I've tried this already. Output stays the same... – jreichert May 06 '21 at 10:30
  • I guess svn output may be not utf8 encoded in the `in` clause. I suggest you to check [these answer](https://stackoverflow.com/a/40098904/3641635), setting default PowerShell encoding might do the job (`$PSDefaultParameterValues['*:Encoding'] = 'utf8'`), otherwise we need to find the svn output encoding (or may be force it with `| Out-File -encoding utf8`). – Zilog80 May 06 '21 at 10:32
  • @Zilog80: Sorry. Nothing of that helped. Even when I do: svn status -q | out-file -encoding utf8 "svnstatus.txt" The text file will be created and has the correct encoding, but the umlauts are still wrong. What I don't understand: When the "svn status" command is the problem, why does it work, when I use a powershell window instead of a script? – jreichert May 06 '21 at 11:16
  • Did you dave the `DumpChangedFiles.ps1` in utf8 encoding? – Theo May 06 '21 at 11:36
  • Can you give us also an hexdump of `svnstatus.txt` ? (On windows, some editor as hex capabilities). – Zilog80 May 06 '21 at 11:37
  • @Zilog80: See added hexdump above... – jreichert May 06 '21 at 14:27
  • @Theo: Yes I have saved the DumpChangedFiles.ps1 in utf8 encoding. – jreichert May 06 '21 at 14:28
  • 0xC2 0xB3 seems not to be UTF16, not NFD nor NFC. It's UTF8 superscript 3, maybe something distorted UTF8 output from svn... – Zilog80 May 06 '21 at 14:41
  • @Zilog80: Sorry, that is out of my knowledge. Any guess? – jreichert May 06 '21 at 14:46
  • With a powershell console, did the command `$PSDefaultParameterValues['*:Encoding'] = 'utf8'; svn status -q` gives the same output ? (superscript 3) – Zilog80 May 06 '21 at 14:51
  • @Zilog80: See my own answer below: Here's the output of the script DumpChangedFiles.ps1 compared to the output of your desired command... – jreichert May 07 '21 at 07:24
  • Can you add `$PSDefaultParameterValues['*:Encoding'] = 'utf8';` at the begonning of `DumpChangedFiles.ps1` and tell us if the script still output a superscript3 instead of ü ? – Zilog80 May 07 '21 at 07:45
  • @Zilog80: Yes, after adding $PSDefaultParameterValues['*:Encoding'] = 'utf8'; at the beginning of DumpChangedFiles.ps1, the script still outputs a superscript3 instead of ü – jreichert May 07 '21 at 09:45
  • Can you give us the output of `svn --version` ? Did your svn client support `-encoding` ? – Zilog80 May 07 '21 at 09:55
  • @Zilog80: I don't know if my svn client does support -encoding For the output of svn --version see answer below... – jreichert May 07 '21 at 13:59
  • @Zilog80: Tortoise SVN – jreichert May 07 '21 at 14:01
  • Ok, i get the same problem with supescript3 instead of ü with tortoise SVN. I guess its `--encoding` will not help here as it concerns encoding of commited/checkouted files. – Zilog80 May 07 '21 at 14:19
  • Finally found what's wrong. Powershell ISE. Add `[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(1252)` at the beginning of your script and tell me if it's ok for you. – Zilog80 May 07 '21 at 14:55
  • 1
    @Zilog80: Thank you very much! I marked the question as answered :-) – jreichert May 10 '21 at 08:19

1 Answers1

1

The problem comes from PowerShell ISE, the svn command in your script is executed through PowerShell ISE which encode its output with Windows-1252 (or your default windows locales).

You can go with the following to get a correct output (check your Windows locales) :

[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(1252)
foreach ( $filename in svn status -q ) 
{
  Write-Host $filename
}

It seems a previous unanswered question relates to the same problem with ISE : Powershell ISE has different codepage from Powershell and I can not change it

Zilog80
  • 2,534
  • 2
  • 15
  • 20
  • @mklement0 It seems ISE is outputting with Windows default locales, maybe it's the same for input, if it can help... – Zilog80 May 07 '21 at 15:23