3

I have strings containing characters which are not found in ASCII; such as á, é, í, ó, ú; and I need a function to convert them into something acceptable such as a, e, i, o, u. This is because I will be creating IIS web sites from those strings (i.e. I will be using them as domain names).

iRon
  • 20,463
  • 10
  • 53
  • 79
user8056359
  • 437
  • 1
  • 7
  • 16
  • 2
    In general, it's called transliteration. Normalizing to FormD and filtering will work to convert composed Latin letters to [Basic Latin](http://www.unicode.org/charts/nameslist/index.html) letters but not ligatures (dž, ǣ, ij, … ) and such. See this [question](https://stackoverflow.com/questions/1841874/how-to-transliterate-cyrillic-to-latin-text). – Tom Blodget Oct 10 '17 at 16:26

2 Answers2

3
function Convert-DiacriticCharacters {
    param(
        [string]$inputString
    )
    [string]$formD = $inputString.Normalize(
            [System.text.NormalizationForm]::FormD
    )
    $stringBuilder = new-object System.Text.StringBuilder
    for ($i = 0; $i -lt $formD.Length; $i++){
        $unicodeCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($formD[$i])
        $nonSPacingMark = [System.Globalization.UnicodeCategory]::NonSpacingMark
        if($unicodeCategory -ne $nonSPacingMark){
            $stringBuilder.Append($formD[$i]) | out-null
        }
    }
    $stringBuilder.ToString().Normalize([System.text.NormalizationForm]::FormC)
}

The resulting function will convert diacritics in the follwoing way:

PS C:\> Convert-DiacriticCharacters "Ångström"
Angstrom
PS C:\> Convert-DiacriticCharacters "Ó señor"
O senor

Copied from: http://cosmoskey.blogspot.nl/2009/09/powershell-function-convert.html

iRon
  • 20,463
  • 10
  • 53
  • 79
2

Taking this answer from a C#/.Net question it seems to work in PowerShell ported roughly like this:

function Remove-Diacritics
{
    Param([string]$Text)


    $chars = $Text.Normalize([System.Text.NormalizationForm]::FormD).GetEnumerator().Where{ 

        [System.Char]::GetUnicodeCategory($_) -ne [System.Globalization.UnicodeCategory]::NonSpacingMark

    }


    (-join $chars).Normalize([System.Text.NormalizationForm]::FormC)

}

e.g.

PS C:\> Remove-Diacritics 'abcdeéfg'
abcdeefg
TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87