6

The target structure looks like the following:

検索結果:100,000件

If I use the following regex pattern:

((?<!検索結果:)(?<!次の)(((〇|一|二|三|四|五|六|七|八|九|十|百|千|万|億|兆|京+|[0-90-9]))(,|,|、)?).+((〇|一|二|三|四|五|六|七|八|九|十|百|千|万|億|兆|京|[0-90-9]).+)件)(?!表示)

As you can see, I want to unmatch everything preceded by "検索結果:" & "次の" using this pattern followed by either Arabic numerals or Japanese kanji (Chinese character) numbers. However, the pattern somehow matches up to 4 digits but not 6 digits.

In other words,

次の1000件

works (meaning it doesn't match anything), but

次の5,0000件

gives a partial match ("0000件")

I want to know why up to 4 digits. And ultimately want to find a way to NOT match anything using this regex. I know this regex is a bit messy. Thanks in advance for your feedback!

Michael
  • 63
  • 5

2 Answers2

2

You need to avoid matching the numbers after a digit or digit + the separator, so you need to add (?<![0-90-9])(?<![0-90-9][,,、]) right after (?<!次の):

(?<!検索結果:)(?<!次の)(?<![0-90-9])(?<![0-90-9][,,、])(?:[〇一二三四五六七八九十百千万億兆0-90-9]|京+)[,,、]?.+[〇一二三四五六七八九十百千万億兆京0-90-9].+件
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Here's one problem that I see so far:

販売実績100万件 販売実績100万件 販売実績1,000件 販売実績1,000件 販売実績1,000,000件です 100,000件 5000件

These are all matching but it captures irrelevant part in between the two matching patterns. For instance,

販売実績100万件販売実績100万件

as one string will match the part that's not supposed to match.

https://regex101.com/r/LfDPHE/1

Michael
  • 63
  • 5