1

I'm creating a domain rule to avoid values in my country domain that won't correspond to country long name/two-letter code/Three-letter code/Numeric code and native title.

Could you please help me create a rule to capture native characters like hiraganas and Arabic letters?

This is how my domain values look like

enter image description here

I'm trying to capture values like

  • البحرين(Bahrain)
  • বাংলাদেশ (Bangladesh)
  • កម្ពុជា (Cambodia)
  • United Arab Emirates (الإمارات العربية المتحدة)

This is my progress so far

enter image description here

Also, I'm wondering what is the dialect used by DQS for regex.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
Juan David
  • 2,676
  • 4
  • 32
  • 42

2 Answers2

1

You can make a single regex that matches multiple alternatives, the basic form is this:

^(A|B|C|D)$

where A, B, C and D represent your allowed patterns, e.g. [0-9]{3} and so on. Read: https://www.regular-expressions.info/alternation.html

This way you only need a single (albeit longer) regex, which is probably easier to handle in the UI, and DQS only has to check the input value against a single expression, which is better for performance.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Thanks! Great suggestion. The only thing that I don't know how to do is to match patterns with international characters like البحرين – Juan David Jun 19 '19 at 19:03
  • International characters are just characters. Write them into the regex. – Tomalak Jun 19 '19 at 19:35
  • That's the point of the question: How should I add special characters like Hiragana, Katakana, Han, and Latin? – Juan David Jun 19 '19 at 20:33
  • Just write them into the regex as they are. "食" is not any less a character than "A" is. – Tomalak Jun 19 '19 at 23:09
1

Maybe you can use Unicode Categories on the regular expressions. Check how to do this here:

https://www.regular-expressions.info/unicode.html

Jesus Rincon
  • 136
  • 9