2

I have some text I am trying to use regexes on. I need to make a match on numbers with % sign which are surrounded by arabic text. I have some regexes that look like this:

const re1 = new RegExp('\\d*,\\d*%');
const re2 = new RegExp('%\\d*,\\d*');

I have some text that looks like this:

%9,2 ملمول/مول 77 أو 

I would expect the second regex to match the text but it is the first one that matches the text. Im sure there is a good reason for it, but I did not see anything in documentation about this. Why does it do this? What is the correct way to match a number with percent sign that is embedded in arabic text?

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
slipperypete
  • 5,358
  • 17
  • 59
  • 99
  • Could you please show an actual example with Arabic text? – Konrad Oct 14 '22 at 21:37
  • 1
    updated @KonradLinkowski – slipperypete Oct 14 '22 at 21:38
  • Isn't written Arabic a right-to-left script? – President James K. Polk Oct 14 '22 at 21:38
  • @PresidentJamesK.Polk i dont understand. Arabic is read right to left, but numbers are read left to right. But I still dont see how that would affect the regex matching – slipperypete Oct 14 '22 at 21:40
  • 8
    Regular expressions match the characters in string order, not display order. The string "\u0645\0644\u0645\u0648\u0644/\u0645\u0648\u0644 77 \u0623\u0648 2,9%" when displayed in an RTL context displays the percent sign on the left, even though it is the last character in the string. You can force a right-to-left display context by putting an RLO in front (and a LRO at the end, just to balance it out): "\u202e\u0645\0644\u0645\u0648\u0644/\u0645\u0648\u0644 77 \u0623\u0648 2,9%\u202d" – Raymond Chen Oct 14 '22 at 21:54
  • @RaymondChen even if it does that which I was suspecting it does because of how it is indexing, then why would it match the first regex and not the second? – slipperypete Oct 14 '22 at 21:59
  • Related question but not a dupe: [Regular Expression Arabic characters and numbers only](https://stackoverflow.com/questions/29729391/regular-expression-arabic-characters-and-numbers-only) – Yogi Oct 14 '22 at 22:06
  • 3
    The first regular expression looks for digits, a comma, more digits, and a percent sign. And the string "\u0645\0644\u0645\u0648\u0644/\u0645\u0648\u0644 77 \u0623\u0648 2,9%" ends in a 2, comma, 9, and percent sign. The second regular expression looks for a percent sign, then digits, a comma, and digits. That string does not satisfy the second expression because the percent sign is at the end of the string. (Though it is displayed on the left.) – Raymond Chen Oct 14 '22 at 22:14
  • @RaymondChen Ok I get it. Thanks. – slipperypete Oct 14 '22 at 22:25
  • Numbers are not reversed in arabic so I guess I was tricking myself by think it would read that part in like a human, which doesn't make sense. If you wanna post that, I'll accept that answer. Thanks again. Kind of wierd though, people get different results. – slipperypete Oct 14 '22 at 22:27
  • 1
    People get different results depending on how they print the Arabic string. If you print it in an English Web page, then the ambient direction is LTR. The `%` sign goes at the end, and in English, the "end" is the right hand side. But if you print it in an Arabic Web page, then the ambient direction is RTL. Again the percent sign goes at the "end", but in Arabic, the "end" is the left hand side. – Raymond Chen Oct 15 '22 at 00:39

2 Answers2

0

I would expect the second regex to match the text

This is exactly what happens:

const re1 = new RegExp('\\d*,\\d*%');
const re2 = new RegExp('%\\d*,\\d*');

const str = '%9,2 ملمول/مول 77 أو'

console.log(re1, str.match(re1))
console.log(re2, str.match(re2))
Konrad
  • 21,590
  • 4
  • 28
  • 64
0

This works on my machine (MacOs, US culture), running Node.js v14.19.3:

const corpus = '%9,2 ملمول/مول 77 أو ';

const rx = /%(\d+),(\d+)/ ;

const m = rx.exec(corpus);

if (!m) {
    console.log("no match found");
} else {
    const [ match, value1, value2 ] = m;
    console.log(`entire match: «${match}»`  );
    console.log(`1st value:    «${value1}»` );
    console.log(`2nd value:    «${value2}»` );
}

Running the above yields:

entire match: «%9,2»
1st value:    «9»
2nd value:    «2»
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135