1

How can I turn Turkish chars to ASCII? (like ş to s)

I tried replace but it didn't do anything. Here is my code:

$posta = $posta.ToLower()
$posta = $posta -replace "ü","u" 
$posta = $posta -replace "ı","i"
$posta = $posta -replace "ö","o"
$posta = $posta -replace "ç","c"
$posta = $posta -replace "ş","s"
$posta = $posta -replace "ğ","g"
$posta = $posta.trim()
write-host $posta

if $posta was eylül it returns eylül

mklement0
  • 382,024
  • 64
  • 607
  • 775
Doruk Han
  • 47
  • 5
  • 2
    It works for me. ... and BTW: You can chain `-replace` operators like this: `$posta = $posta -replace 'ü','u' -replace 'ı','i' -replace 'ö','o' -replace 'ç','c' -replace 'ş','s' -replace 'ğ','g'` – Olaf Sep 18 '22 at 15:33
  • its working now but if i enter öğrenci it returns ogrenci ogrencI – Doruk Han Sep 18 '22 at 15:45
  • So you have an error in your code. You may show it. You can update your question – Olaf Sep 18 '22 at 15:46
  • code does not return an error – Doruk Han Sep 18 '22 at 15:49
  • 1
    @DorukHan not returning an error doesn't mean that the code doesn't have any error. It just means there's no syntax error but there are still logical errors in the code – phuclv Sep 18 '22 at 15:54

2 Answers2

1

All credits to this answer combined with the comment in the same answer which shows the appropriate way to do it by filtering for characters which are not NonSpacingMark followed by replacing ı with i. The answer is in hence sharing how it can be done in .

Original answer uses Enumerable.Where which in PowerShell would look like this:

$posta = 'üıöçşğ'
[string]::new([System.Linq.Enumerable]::Where(
    [char[]] $posta.Normalize([Text.NormalizationForm]::FormD),
    [Func[char, bool]]{ [char]::GetUnicodeCategory($args[0]) -ne [Globalization.UnicodeCategory]::NonSpacingMark })).
    Replace('ı', 'i')

However Linq syntax is quite cumbersome in PowerShell as these are not extension methods we need to call the APIs directly. A relatively easier approach is to use .Where intrinsic method:

$posta = 'üıöçşğ'
[string]::new($posta.Normalize([Text.NormalizationForm]::FormD).ToCharArray().
    Where{ [char]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark }).
    Replace('ı', 'i')

A simplified approach using -replace operator, thanks to mklement0 for the tip:

$posta = 'üıöçşğ'
$posta.Normalize('FormD') -replace '\p{M}' -creplace 'ı', 'i'

See Unicode category or Unicode block: \p{} for details.

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
1

To complement Santiago Squarzon's helpful answer with a PowerShell (Core) 7+ alternative, which builds on the guidance in this helpful answer, explaining that there is a fixed set of 12 characters in the Turkish alphabet that have equivalent ASCII characters:

  • Note: Santiago's answer uses an approach that generally removes the accents (diacritics) from letters, and then adds Turkish-specific ı handling. The solution below is specific to the Turkish alphabet (and is case-sensitive; it could easily be adapted to be case-insensitive / all-lowercase).
# PS 7+

$turkish = 'ıİöÖçÇüÜğĞşŞ' # Turkish chars. with ASCII equivalents...
$ascii   = 'iIoOcCuUgGsS' # ...and their ASCII equivalents.

# Replace those Turkish chars. that have ASCII equivalents with their equivalents.
# ->  'A_iIoOcCuUgGsS_Z'
'A_ıİöÖçÇüÜğĞşŞ_Z' -replace ('[' + $turkish + ']'),
                            { $ascii[$turkish.IndexOf($_.Value)] }

This solution relies on a PowerShell (Core) 7+ feature of the regex-based -replace operator, namely the option to pass a script block ({ ... }) as the replacement operand, which enables determining the replacement value algorithmically, based on each reported match ($_.Value).

mklement0
  • 382,024
  • 64
  • 607
  • 775