2

I searched over web pages and stack overflow about validation of a Persian(Farsi) language string. Most of them have mentioned Arabic letters. Also, I want to know if my string is fully Persian(not contain). for example, these strings are Persian:

"چهار راه"

"خیابان."

And These are not:

"خیابان 5"

"چرا copy کردی؟"

Also, just Persian or Arabic digits are allowed. There are exceptions about [.,-!] characters(because keyboards are not supported these characters in Persian)

UPDATE: I explained a swift version of using regex and predicate in my answer.

iman kazemayni
  • 1,255
  • 1
  • 19
  • 20
  • You must use the range of unicode values in range u0600 u06FF – ares777 Dec 07 '18 at 12:30
  • I don't know what languages you use, but i see some nice work here https://github.com/anetwork/validation/blob/master/src/ValidationRules.php with validation. I will try to see if regex works in swift.. – ares777 Dec 07 '18 at 12:38
  • Basic validation works with ("[\u{600}-\u{6FF}\u{064b}\u{064d}\u{064c}\u{064e}\u{064f}\u{0650}\u{0651}]") ... now you must define your regex properly to exclude all sets which are not in range. – ares777 Dec 07 '18 at 13:47
  • Possible duplicate of [regex for accepting only persian characters](https://stackoverflow.com/questions/22565100/regex-for-accepting-only-persian-characters) – revo Dec 16 '18 at 14:35
  • @revo: As you can see, my question and my answer are about swift. many programmers such as me search for the swift solution. however, using regex is common between languages. – iman kazemayni Dec 16 '18 at 17:07
  • and you need to know how to match them this is a job for regular expressions and the same regex from above answer works in swift. – revo Dec 16 '18 at 17:41

2 Answers2

3

Based on this extension found elsewhere:

       extension String {
           func matches(_ regex: String) -> Bool {
           return self.range(of: regex, options: .regularExpression, range: nil, locale: nil) != nil
           }
        }

and construct your regex containing allowed characters like

    let mystra = "چهار راه"
    let mystrb = "خیابان."
    let mystrc = "خیابان 5"
    let mystrd = "چرا copy کردی؟"      //and so on
    for a in mystra {
        if String(a).matches("[\u{600}-\u{6FF}\u{064b}\u{064d}\u{064c}\u{064e}\u{064f}\u{0650}\u{0651}\u{0020}]") {  // add unicode for dot, comma, and other needed puctuation marks, for now I added space etc

    } else {         // not in range
        print("oh no--\(a)---zzzz")
        break        // or return false 
        }
    }

Make sure you construct the Unicode needed using the above model. Result for other strings for a in mystrb ... etc oh no--.---zzzz oh no--5---zzzz oh no--c---zzzz

Enjoy

ares777
  • 3,590
  • 1
  • 22
  • 23
2

After a period I could find a better way:

extension String {
 var isPersian: Bool {
        let predicate = NSPredicate(format: "SELF MATCHES %@",
                                    "([-.]*\\s*[-.]*\\p{Arabic}*[-.]*\\s*)*[-.]*")
        return predicate.evaluate(with: self)
    }

}

and you can use like this:

print("yourString".isPersian) //response: true or false

The main key is using regex and predicate. these links help you to manipulate whatever you want:

https://nshipster.com/nspredicate/

https://nspredicate.xyz/

http://userguide.icu-project.org/strings/regexp

Feel free and ask whatever question about this topic :D

[EDIT] The following regex can be used to accept Latin numerics, as they are mostly accepted in Persian texts

"([-.]*\\s*[-.]*\\p{Arabic}*[0-9]*[-.]*\\s*)*[-.]*"
iman kazemayni
  • 1,255
  • 1
  • 19
  • 20
  • This will accept Sindhi characters like ٻ and ٿ as Persian, as well as Arabic ٤. Do you mean that? The referenced C# answer seems to have a much more accurate regex. – Rob Napier Dec 16 '18 at 17:45
  • `\p{Arabic}` includes ~1000 characters (some of which are `٭ ٪ ؉ ؊ ؈ ؎ ؏ ۞ ۩`). Are you sure you required this? – revo Dec 16 '18 at 17:46