1

Does locale in the .range method allow for automatic detection in special language characters?

I can't find any information on this and it's not working in my implementation. And if not, are there any better/other methods of adding support for different language characters? Or is hardcoding UTF values into regex the only way?

The problem being, even if I hardcode Danish characters into the solution, in the future it might need to support other languages, so what's the correct way to go about this?

import Foundation

func isUserNameValid(userName: String, locale: Locale) -> Bool {
    return userName.range(
        of: #"(?mi)^[a-z](?!(?:.*\.){2})(?!(?:.* ){2})(?!.*\.[a-z])[a-z. ]{1,}[a-z]$"#,
        options: .regularExpression,
        range: nil,
        locale: locale) != nil
}

let inputName = "Lærke"
if isUserNameValid(userName: inputName, locale: Locale(identifier: "da-DK")) {
    print("valid")
} else {
    print("not valid")
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
crowjam
  • 43
  • 4
  • You mean you want to find strings that contains only letters in the [Danish alphabet](https://en.wikipedia.org/wiki/Danish_and_Norwegian_alphabet)? – Sweeper Jan 30 '21 at 10:12
  • No need to use any locales, just use `\p{L}` to match any Unicode letter. – Wiktor Stribiżew Jan 30 '21 at 12:20
  • OP, please clarify who (@WiktorStribiżew or me) understood your question correctly. Do you want to only match the letters that are in the alphabet of a certain language (e.g. Danish) or languages, or characters that are considered letters in any language? – Sweeper Jan 30 '21 at 12:53
  • 1
    danish currently, but also with option to extend it to other languages if need be – crowjam Jan 31 '21 at 13:08
  • You may find [this answer](https://stackoverflow.com/questions/65966539/how-to-use-regex-name-validation-for-danish-names-using-locale) helpful. Your function can be boiled down to `func isValid(userName: String, with locale: Locale = Locale.current) -> Bool { guard let localeCharacterSet = locale.exemplarCharacterSet else { return true } return userName.allSatisfy{ localeCharacterSet.contains($0.unicodeScalars.first!) } }` – Paul B Apr 11 '22 at 14:25

1 Answers1

1

Does locale in .range method allow for automatic detection in special language characters?

The locale parameter is there for locale-sensitive string comparisons. If you use the .regularExpression option, then it totally ignores the locale parameter, because now your regex specifies exactly how the comparison should be done, no need for the locale.

Compare:

// nil
"I".range(of: "i", options: .caseInsensitive, range: nil, locale: Locale(identifier: "tr-TR"))

// not nil
"I".range(of: "(?i)i", options: .regularExpression, range: nil, locale: Locale(identifier: "tr-TR"))

In the first case, I use the Turkey locale to compare i and I, case insensitively. The comparison fails because in Turkey, The lowercase I looks like this: ı (U+0131 LATIN SMALL DOTLESS I).

In the second case, I do the same thing but with a regex. It successfully matches the I. This shows that if you use a regex, it completely ignores the locale.

If I understood what you want to do correctly, Locale.exemplarCharacterSet might be useful to you. For most languages, it has all the characters in that language's writing system. You might need to check each unicode scalar in string one by one, rather than with a regex.

Regexes can check for Unicode Properties with \p, but alphabets of specific languages are too specific. For example, All the letters in the Danish alphabet has the script property Latin, but so does many things not in the Danish alphabet, like the dotless i.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • this is good information, and I might use it in the future solution, but for now @Wiktor Stribiżew answer to use \p{L} is working quite well for me, all tests passing. This is the final regex (?mi)^[\p{L}](?!(?:.*\.){2})(?!(?:.* ){2})(?!.*\.[\p{L}])[\p{L}. ]{1,}[\p{L}]$ – crowjam Jan 31 '21 at 13:09