1

I read some tips, if add ? can match the non-greedy regex like,

x = "a (b) c (d) e"
re.search(r"\(.*?\)", x).group()
>>> '(b)'

why my code cannot work?

import re
item = "等" #item is variable 
content = "大数据,人工智能,云计算:数字孪生、5G,物联网和区块链等新一代数字技术应用"
res = re.search(r'(?<=[,,:、,])(.*?)(?=' + item + ')', content).group()
print(res)

I want to find the content in the middle of any cloest symbols(left) and item(right), and I hope the res like

物联网和区块链

but it return long one, non-greedy didn't work, why?

4daJKong
  • 1,825
  • 9
  • 21
  • Because `.` matches all of the `,,:、,` chars. Replace it with a negated character class if none should appear on the way from the punctuation chars to the `item`, i.e. `[^,,:、,]`. – Wiktor Stribiżew Jun 02 '22 at 07:23
  • @WiktorStribiżew thanks lot, it works now! p.s the regex like, ' r'[^,,:、,]*(?=' + item + ')' ' – 4daJKong Jun 02 '22 at 07:35
  • Or, better, `re.search(fr'[,,:、,]([^,,:、,]*?){re.escape(item)}', content)`, then grab `.group(1)`. – Wiktor Stribiżew Jun 02 '22 at 07:38
  • @Hi Wiktor, I have a question, if the start is some words, for example t1,t2, not single symbols, how to correct it? I mean where can I add ^ to (t1|t2)? – 4daJKong Jun 02 '22 at 16:32
  • `re.search(fr'(?:t1|t2)((?:(?!t1|t2).)*?){re.escape(item)}', content)`, then grab `.group(1)` – Wiktor Stribiżew Jun 02 '22 at 16:34
  • @WiktorStribiżew Sorry to bother you again, thanks for your help at first! may I ask a similar question, actually I want to find the sub-string, start with t1 or t2(not include) , to the end of string, and non-greedy(make the sub-string shortest), I want to follow your idea `fr'(?:t1|t2)((?:(?!t1|t2).)*?)'` but not sucessful...any suggestion is helpful for me! – 4daJKong Jun 02 '22 at 16:56
  • Just remove `?` at the end. `r'(?:t1|t2)((?:(?!t1|t2).)*)'` – Wiktor Stribiżew Jun 02 '22 at 17:38

0 Answers0