Can any one tell me regular expression for Arabic characters in Ruby?
Asked
Active
Viewed 905 times
2 Answers
6
You can use the \p
Character Properties:
/\p{Arabic}/
Example:
"مرحبا بالعالم".scan(/\p{Arabic}+/)
# ["\u0645\u0631\u062D\u0628\u0627", "\u0628\u0627\u0644\u0639\u0627\u0644\u0645"]

Yu Hao
- 119,891
- 44
- 235
- 294
-
It is(/\p{Arabic}/) not working for ruby 1.8.7. I am using ruby 1.8.7 in my project. Any idea for ruby 1.8.7? – Sivananda Jan 12 '15 at 07:27
-
1@Sivananda Probably not what you want to hear, but, update your Ruby version? – Yu Hao Jan 12 '15 at 12:00
-
@Sivananda Ruby 1.8.7 was [retired](https://www.ruby-lang.org/en/news/2013/06/30/we-retire-1-8-7/) over a year and a half ago. – Mark Thomas Jan 12 '15 at 12:17
-
@Yu Hao & Mark Thomas, Thanks for your response!. But My client was using old version ruby only. Is there a way convert our string into Unicode. SO that I can use this pattern [\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufc3f]|[\ufe70-\ufefc]. I have used "Iconv" library option ::Iconv.conv('UTF-8//IGNORE', 'UTF-8', 'لستتتثييي') its give the following output: "\331\204\330\263\330\252\330\252\330\252\330\253\331\212\331\212\331\212" – Sivananda Jan 12 '15 at 14:01
1
list of Arabic character set:
[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufc3f]|[\ufe70-\ufefc]
source: https://stackoverflow.com/a/11323651/3035830
Example:
arabic = "لأَبْجَدِيَّة العَرَبِيَّة - الحُرُوُفْ العَرَبِيَةُ"
#=> "لأَبْجَدِيَّة العَرَبِيَّة - الحُرُوُفْ العَرَبِيَةُ"
arabic.split(' ').each{|ab| ab.scan(/[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufc3f]|[\ufe70-\ufefc]/)}
#=> ["لأَبْجَدِيَّة", "العَرَبِيَّة", "-", "الحُرُوُفْ", "العَرَبِيَةُ"]
Now you can put the check accordingly to validate if texts are in arabic or not.
-
I used above regular expression but its not working: patt = /[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufc3f]|[\ufe70-\ufefc]/ => /[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufc3f]|[\ufe70-\ufefc]/ 1.8.7-p376 :002 > str = "هْلِهِ وَجِيْرَانِهِ وَأَنْ يَبْذُلَ كُلَّ " 1.8.7-p376 :003 > str.match(patt) => nil – Sivananda Jan 12 '15 at 06:14
-
-
@Sivananda I updated with some example. Can you check again? The character sets seem to work fine. – shivam Jan 12 '15 at 06:16
-
@muistooshort I have tested above example in irb its gave the following output ["\331\204\330\243\331\216\330\250\331\222\330\254\331\216\330\257\331\220\331\212\331\216\331\221\330\251", "\330\247\331\204\330\271\331\216\330\261\331\216\330\250\331\220\331\212\331\216\331\221\330\251", "-", "\330\247\331\204\330\255\331\217\330\261\331\217\331\210\331\217\331\201\331\222", "\330\247\331\204\330\271\331\216\330\261\331\216\330\250\331\220\331\212\331\216\330\251\331\217"] – Sivananda Jan 12 '15 at 06:26