2

I have this kind of text:

other text opt1 opt2 opt3 I_want_only_this_text because_of_this

And am using this regex:

(?<=opt1|opt2|opt3).*?(?=because_of_this)

Which returns me:

opt2 opt3 I_want_only_this_text

However, I want to match only "I_want_only_this_text".

What is the best way to achieve this?

I don't know in what order the opt's will appear and they are only examples. Actual words will be different and there will be more of them.

Test screenshot

Actual data: regex

(?<=※|を|備考|町|品は|。).*(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)

text

こだわり豚には通常の豚よりビタミンB1が2倍以上あります。私たちの育てた愛情たっぷりのこだわり豚をぜひ召し上がってください。商品説明名称えびの産こだわり豚切落し産地宮崎県えびの市内容量500g×8パック合計4kg賞味期限90日保存方法-15℃以下で保存すること提供者株式会社さつま屋産業備考・本お礼品は冷凍でのお届けとなります

what I want to get:

冷凍で

SoluriX
  • 47
  • 1
  • 5

3 Answers3

2

You could add a negative lookahead (?!\s*opt\d) to assert that there is no opt and a digit to the right. You can use a character class to list the digits 1, 2 and 3 instead of using the alternation with |.

(?<=\bopt[123]\s(?!\s*opt\d)).*?(?=\s*\bbecause_of_this\b)

Regex demo

It might be a bit more efficient to use a match with a capture group:

\bopt[123]\s(?!\s*opt\d)(.*?)\s*\bbecause_of_this\b

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • I'm sorry for not being clear enough The opt's are just examples. The actual words/chars will be different and can't be processed like this. – SoluriX Apr 09 '21 at 07:47
  • Can you give an example of the actual data? – The fourth bird Apr 09 '21 at 07:48
  • regex: (?<=※|を|備考|町|品は|。).*(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします) text: こだわり豚には通常の豚よりビタミンB1が2倍以上あります。私たちの育てた愛情たっぷりのこだわり豚をぜひ召し上がってください。商品説明名称えびの産こだわり豚切落し産地宮崎県えびの市内容量500g×8パック合計4kg賞味期限90日保存方法-15℃以下で保存すること提供者株式会社さつま屋産業備考・本お礼品は冷凍でのお届けとなります I want only 冷凍で – SoluriX Apr 09 '21 at 07:51
  • @BartoszLulka You might take an approach like in the answer of JvdV and use a capture group `.*(?:※|を|備考|町|品は|。)(.*?)(?:のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)` See https://regex101.com/r/nHToyE/1 – The fourth bird Apr 09 '21 at 08:03
2

What about:

.*\bopt[123]\b\s*(.*?)\s*because_of_this\b

See the online demo.

.* - A greedy match of any character other than newline upto the last occurence of: \bopt[123]\b - A word boundary followed by literally "opt" with a trailing number 1, 2 or 3 and another word boundary.

  • \s* - 0+ whitespace characters.
  • (.*?) - A 1st capture group with a lazy match of 0+ characters upto:
  • \s* - 0+ whitespace characters.
  • because_of_this\b - Literally "because_of_this" followed by a word-boundary.

If you need to have this written out in alternations:

.*\b(?:opt1|opt2|opt3)\b\s*(.*?)\s*because_of_this\b

See that demo.

JvdV
  • 70,606
  • 8
  • 39
  • 70
2

You can use

(?<=※|を|備考|町|品は|。)(?:(?!※|を|備考|町|品は|。).)*?(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)

See the regex demo. The scheme is the same as in (?<=opt1|opt2|opt3)(?:(?!opt1|opt2|opt3).)*?(?=because_of_this) (see demo).

The tempered greedy token solution allows you to match multiple occurrences of the same pattern in a longer string.

Details

  • (?<=※|を|備考|町|品は|。) - a positive lookbehind that matches a location that is immediately preceded with one of the alternatives listed in the lookbehind
  • (?:(?!※|を|備考|町|品は|。).)*? - any char other than a line break char, zero or more but as few as possible occurrences, that is not a starting point of any of the alternative patterns in the negative lookahead
  • (?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします) - a positive lookahead that requires one of the alternative patterns to appear immediately to the right of the current location.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563