How can I use regex to match sub-string start by words (not included) to the end of string, and keep non-greedy at same time?

Question

I want to find a sub-string that starts with words (\d月|\d日) (not included in result) and to the end of the string, at the same time, keep the sub-string shortest (non-greedy). for example,

str1 = "秋天9月9日长江工程完成"
res1 = re.search(r'(\d月|\d日).*', str1).group() #return 9月9日长江工程完成

I want to return the result like 长江工程完成, for another example,

str2 ="秋天9月9日9日长江工程完成"

it should get same results like previous one

thus I tried these several methods, but all return un-expected results, please give me some suggestion...

res1 = re.search(r'(?:(?!\d月|\d日))(?:\d月|\d日)', str1).group() #return 9月
res1 = re.search(r'(?:\d月|\d日)((?:(?!\d月|\d日).)*?)', content).group()  #return 9月

Please explain more about the format. Do you want any number of groups like `\d月` and `\d日`? or just two? Can it be `日` first and `月` second? [More examples would help](https://meta.stackoverflow.com/a/418442/10295729) — decorator-factory, Jun 02 '22 at 17:29
@decorator-factoryde thanks for reply and I rewrite it, please have a look — 4daJKong, Jun 02 '22 at 17:31
Remove last `?`. Use `re.search(r'(?:\d月|\d日)((?:(?!\d月|\d日).)*)$', content).group(1)`, see https://ideone.com/k6wuVO — Wiktor Stribiżew, Jun 02 '22 at 17:39

score 1 · Accepted Answer · answered Jun 02 '22 at 17:42

If you want to capture the rest of the string, surround .* with a group.

To capture one or more of the same pattern, you can use the + operator.

import re

content = "9月9日9月长江工程完成"
match = re.match(r'(?:\d月|\d日)+(.*)', content)
print(match[1])

Output:

长江工程完成

(?:(?!\d月|\d日))(?:\d月|\d日)

This pattern only captures the initial words, because you don't capture the rest as a group. (Also, it only allows for exactly two occurences).

(?:\d月|\d日)((?:(?!\d月|\d日).)*?)

This pattern requires only matches strings that look like this: 9月4日a6日b0月x - probably not what you need

P.S. Make sure you pick right function from the re: match, search or fullmatch (see What is the difference between re.search and re.match?). You said that you need the whole string needs to start with the given words, so match or fullmatch.

Sorry to bother you again, consider more general situation, what if some words in front of d月|\d日? I rewrite the content of str1 and str2, please have a look. and by the way, did match means searching from the beginning position and search from any position of string? — 4daJKong, Jun 03 '22 at 03:04

How can I use regex to match sub-string start by words (not included) to the end of string, and keep non-greedy at same time?

1 Answers1