1

I have this script in Powershell that should automatically replace strings in a few files. I'm working on a translation project and I need a version without any diacritics.

Here's the code:

# French
$content = Get-Content -Path 'Translation_Files\Language_fr-FR.bat'
$newContent = $content -replace 'è', 'e'
$newContent = $newContent -replace 'é', 'e'
$newContent = $newContent -replace 'ë', 'e'
$newContent = $newContent -replace 'à', 'a'
$newContent = $newContent -replace 'î', 'i'
$newContent = $newContent -replace 'ï', 'i'
$newContent = $newContent -replace 'ç', 'c'
$newContent = $newContent -replace 'ô', 'o'
$newContent = $newContent -replace 'ü', 'u'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_fr-FR.bat'

# Polish
$content = Get-Content -Path 'Translation_Files\Language_pl-PL.bat'
$newContent = $content -replace 'ą', 'a'
$newContent = $newContent -replace 'ć', 'c'
$newContent = $newContent -replace 'ł', 'l'
$newContent = $newContent -replace 'ó', 'o'
$newContent = $newContent -replace 'ń', 'n'
$newContent = $newContent -replace 'ż', 'z'
$newContent = $newContent -replace 'ź', 'z'
$newContent = $newContent -replace 'ę', 'e'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_pl-PL.bat' 

# German
$content = Get-Content -Path 'Translation_Files\Language_de-DE.bat'
$newContent = $content -replace 'ä', 'a'
$newContent = $newContent -replace 'ö', 'o'
$newContent = $newContent -replace 'ü', 'u'
$newContent = $newContent -replace 'ß', 's'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_de-DE.bat'

# Hungarian
$content = Get-Content -Path 'Translation_Files\Language_hu-HU.bat'
$newContent = $content -replace 'í', 'i'
$newContent = $newContent -replace 'á', 'a'
$newContent = $newContent -replace 'é', 'e'
$newContent = $newContent -replace 'ó', 'o'
$newContent = $newContent -replace 'ő', 'o'
$newContent = $newContent -replace 'ú', 'u'
$newContent = $newContent -replace 'ű', 'u'
$newContent = $newContent -replace 'ē', 'e'
$newContent = $newContent -replace 'è', 'e'
$newContent = $newContent -replace 'ȅ', 'e'
$newContent = $newContent -replace 'e̋', 'e'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_hu-HU.bat'

# Spanish
$content = Get-Content -Path 'Translation_Files\Language_es-ES.bat'
$newContent = $content -replace 'é', 'e'
$newContent = $newContent -replace '¿', '?'
$newContent = $newContent -replace 'á', 'a'
$newContent = $newContent -replace 'ó', 'o'
$newContent = $newContent -replace 'í', 'i'
$newContent = $newContent -replace 'Ó', 'O'
$newContent = $newContent -replace '¡', '!'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_es-ES.bat'

# Italian
$content = Get-Content -Path 'Translation_Files\Language_it-IT.bat'
$newContent = $content -replace 'è', 'e'
$newContent = $newContent -replace 'é', 'e'
$newContent = $newContent -replace 'ì', 'i'
$newContent = $newContent -replace 'ò', 'o'
$newContent = $newContent -replace 'ù', 'u'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_it-IT.bat'

# Portuguese (Brazilian)
$content = Get-Content -Path 'Translation_Files\Language_pt-BR.bat'
$newContent = $content -replace 'á', 'a'
$newContent = $newContent -replace 'â', 'a'
$newContent = $newContent -replace 'ã', 'a'
$newContent = $newContent -replace 'à', 'a'
$newContent = $newContent -replace 'ç', 'c'
$newContent = $newContent -replace 'é', 'e'
$newContent = $newContent -replace 'ê', 'e'
$newContent = $newContent -replace 'í', 'i'
$newContent = $newContent -replace 'ó', 'o'
$newContent = $newContent -replace 'ô', 'o'
$newContent = $newContent -replace 'õ', 'o'
$newContent = $newContent -replace 'ú', 'u'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_pt-BR.bat'

However, when I run this in the powershell console - I get a lot of errors such as:

At Update_Translations.ps1:59 char:38
+ $newContent = $newContent -replace 'Ă“', 'O'
+                                      ~~~~~~~
The string is missing the terminator: ".
At Update_Translations.ps1:21 char:43
+ $newContent = $newContent -replace 'Ĺ‚', 'l'
+                                           ~~
Unexpected token 'l'
$newContent = $newContent -replace 'Ăł', 'o'
$newContent = $newContent -replace 'Ĺ„', 'n'
$newContent = $newContent -replace 'ĹĽ', 'z'
$newContent = $newContent -replace 'Ĺş', 'z'
$newContent = $newContent -replace 'Ä™', 'e'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_pl-PL.bat'
# German
$content = Get-Content -Path 'Translation_Files\Language_de-DE.bat'
$newContent = $content -replace 'ä', 'a'
$newContent = $newContent -replace 'ö', 'o'
$newContent = $newContent -replace 'ĂĽ', 'u'
$newContent = $newContent -replace 'Ăź', 's'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_de-DE.bat'
# Hungarian
$content = Get-Content -Path 'Translation_Files\Language_hu-HU.bat'
$newContent = $content -replace 'Ă­', 'i'
$newContent = $newContent -replace 'á', 'a'
$newContent = $newContent -replace 'Ă©', 'e'
$newContent = $newContent -replace 'Ăł', 'o'
$newContent = $newContent -replace 'Ĺ‘', 'o'
$newContent = $newContent -replace 'Ăş', 'u'
$newContent = $newContent -replace 'ű', 'u'
$newContent = $newContent -replace 'Ä“', 'e'
$newContent = $newContent -replace 'è', 'e'
$newContent = $newContent -replace 'Č…', 'e'
$newContent = $newContent -replace 'e̋', 'e'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_hu-HU.bat'
# Spanish
$content = Get-Content -Path 'Translation_Files\Language_es-ES.bat'
$newContent = $content -replace 'Ă©', 'e'
$newContent = $newContent -replace 'Âż', '?'
$newContent = $newContent -replace 'á', 'a'
$newContent = $newContent -replace 'Ăł', 'o'
$newContent = $newContent -replace 'Ă­', 'i'
$newContent = $newContent -replace 'Ă“', 'O'
$newContent = $newContent -replace '¡', '!'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_es-ES.bat'
# Italian
$content = Get-Content -Path 'Translation_Files\Language_it-IT.bat'
$newContent = $content -replace 'è', 'e'
$newContent = $newContent -replace 'Ă©', 'e'
$newContent = $newContent -replace 'ì', 'i'
$newContent = $newContent -replace 'ò', 'o'
$newContent = $newContent -replace 'Ăą', 'u'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_it-IT.bat'
# Portuguese (Brazilian)
$content = Get-Content -Path 'Translation_Files\Language_pt-BR.bat'
$newContent = $content -replace 'á', 'a'
$newContent = $newContent -replace 'â', 'a'
$newContent = $newContent -replace 'ĂŁ', 'a'
$newContent = $newContent -replace 'Ă ', 'a'
$newContent = $newContent -replace 'ç', 'c'
$newContent = $newContent -replace 'Ă©', 'e'
$newContent = $newContent -replace 'ĂŞ', 'e'
$newContent = $newContent -replace 'Ă­', 'i'
$newContent = $newContent -replace 'Ăł', 'o'
$newContent = $newContent -replace 'Ă´', 'o'
$newContent = $newContent -replace 'õ', 'o'
$newContent = $newContent -replace 'Ăş', 'u'
$newContent | Set-Content -Path 'Translation_Files_CHCP_OFF\Language_pt-BR.bat'
' in expression or statement.
    + CategoryInfo          : ParserError: (:) [], ParseException
    + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString

Any help would be appreciated. Cheers.

KcrPL
  • 15
  • 2
  • 1
    Seems like some tricky encoding thing. It would help if you reduced it to 1 single example of what doesn't work, and maybe link a file, because some might not be copy-pasted correctly. – marsze Sep 19 '20 at 20:31
  • 2
    Somewhere along the line your script has been encoded as UTF8 and then decoded as a different encoding - e.g. ISO-8859-1, which might happen if you downloaded it from a website which sent the UTF8 bytes of your script without setting a Content-Type header, since the default encoding for HTTP is ISO-8559-1. To reproduce the issue, try this: ```$bytes = [System.Text.Encoding]::UTF8.GetBytes("é"); $mangled = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetString($bytes); write-host "[$mangled]";``` and you'll see ```é``` gets converted to ```é```, which is the same as in your error message. – mclayton Sep 19 '20 at 21:35
  • 1
    Your script is far form [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). I guess, there is no reason to handle languages differently (any `a` with a diacritic character will eventually end up as an `a` independent of the language). Besides I think you are reinventing the wheel, see: [Converting Unicode string to ASCII](https://stackoverflow.com/a/46660695/1701026) – iRon Sep 20 '20 at 07:14
  • 1
    Have you searched for this before creating code using numerous `-replace` actions? Look at [this code](https://stackoverflow.com/a/46660695/1701026) or [here](https://stackoverflow.com/questions/62737875/powershell-replace-special-characters-like-%c3%bc) – Theo Sep 20 '20 at 10:40

0 Answers0