-2

I have this string date السبت 18 سبتمبر/أيلول 2021

how can I use regex to extract valid date without day and month after slash

dd-month-yyyy

example of data:

<p class="schedule__date sub-heading">الثلاثاء 21 سبتمبر/أيلول 2021</p>
<p class="schedule__date sub-heading">الأربعاء 22 سبتمبر/أيلول 2021</p>
Mohamad A Sallal
  • 614
  • 6
  • 12
  • What you want is a regex that matches unicode characters (arabic characters and numbers), which i believe [this questions](https://stackoverflow.com/questions/150033/regular-expression-to-match-non-ascii-characters) explains. – Mohamed abdelmagid Sep 18 '21 at 07:59
  • yes, i tried with it, but because I need both Arabic and ASCII digit its is not working with me – Mohamad A Sallal Sep 18 '21 at 08:02
  • Do your dates always come in this format? `السبت 18 سبتمبر/أيلول 2021`? and what would be your expected output of this input? – Alireza Sep 18 '21 at 08:04
  • yes, it is in same format always, my expected output like this format dd-month-yyyy – Mohamad A Sallal Sep 18 '21 at 19:49

1 Answers1

0

To extract the whole date expression, e.g., عاء 22 سبتمبر/أيلول 2021 try this:

(?<=>).*[^ -~]+\\s\\d+

where [^ -~] is shorthand for non-ASCII characters

To extract just the first part prior to the slash, e.g., أيلول 2021, try this:

[^ -~]+\\s\\d+
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • I need the result to be like this 22 سبتمبر 2021, dd mon yyyy but because it has combination of rtl and ASCII its hard to show it here, can you check here https://regex101.com/r/ojgGkP/1 – Mohamad A Sallal Sep 22 '21 at 08:16