1

I'd like to have a regular expression rule that matches to texts where 1. has Japanese characters (= English only text is not allowed) 2. can have English (alphabetical) characters 3. can have other arbitrary common symbols such as -,.。、!!??@@・'’0-9.

The following meets 1. has Japanese characters, but not 2 and 3.

var regex = /^[^\p{Script=Hiragana}\p{Script=Katakana}\p{Script=Han}\p{Script=Katakana_Or_Hiragana}\p]+$/
"パイソンが得意です".match(regex) // matches
"pythonが得意です".match(regex) // does not match
"I'm good at python.".match(regex) // does not match

The problem is that, the following still doesn't meet 2, 3, while I expected it to meet the all of three.

var regex = /^[^\p{Script=Hiragana}\p{Script=Katakana}\p{Script=Han}\p{Script=Katakana_Or_Hiragana}\pA-Za-z\s'\.]+$/
"パイソンが得意です".match(regex) // matches
"pythonが得意です".match(regex) // still does not match
"I'm good at python.".match(regex) // still does not match

So, what's the problem to fix the regex rule?

  • https://stackoverflow.com/questions/6787716/regular-expression-for-japanese-characters . please check this if it helps.. – thar45 Jul 05 '23 at 15:48

1 Answers1

2

First, you need the u flag to use \p{}. Second, there's no script named Katakana_Or_Hiragana. Third, add everything you permit in a single character class.

Get those three right, then add a lookahead to ensure that there is a Katakana/Hiragana/Kanji in the input string:

^
(?=.*[\p{Script=Hiragana}\p{Script=Katakana}\p{Script=Han}])
[\p{Script=Hiragana}\p{Script=Katakana}\p{Script=Han}a-zA-Z-,.。、!!??@@・'’0-9 ]+
$

Try it on regex101.com.

InSync
  • 4,851
  • 4
  • 8
  • 30