findall() returns only one result

Question

I am trying to extract text between two words including the words that set the boundary using findall().

description = 'White cat sat on the mat and then the cat ran away'
starting_word = 'cat'
ending_word = 'ran'

detail_re = r'{0}.*?{1}'.format(starting_word, ending_word)
extracted_text_list = re.findall(detail_re, description,re.IGNORECASE)

Expected result:

['cat sat on the mat and then the cat ran', 'cat ran']

However, the result is:

['cat sat on the mat and then the cat ran']

How can I get the expected answer?

That is the expected answer. There's only one match to the *detail_re* in your *description* str. If you're wanting to capture the beginning part you need to have r'\*?{0}.*?{1}' — Jeff Gruenbaum, Aug 26 '22 at 14:43
Shorter example (oneliner): `re.findall('cat.*?ran', 'White cat sat on the mat and then the cat ran away')` — Thomas, Aug 26 '22 at 14:44
How could the first item in your expected results possibly be a match? It doesn't start with "cat". — jasonharper, Aug 26 '22 at 14:44
Try `detail_re = r'(.*\b({0})\b.*?\b({1})\b)'.format(starting_word, ending_word)` — Wiktor Stribiżew, Aug 26 '22 at 14:45
@JeffGruenbaum this regular expression throws an error 're.error: nothing to repeat at position 0'. I am expecting ['cat sat on the mat and then the cat ran', 'cat ran'] — nr spider, Aug 26 '22 at 14:52
@nrspider Forgot to add the period. Should be r'.\*?{0}.\*?{1}'. Didn't realize you edited your expected result. Ignore the regex suggestion. It will still only return one match because you only have one word *ran*, so it can only match once. If you change your description to `description = 'White cat sat on the mat and then the ran cat ran away'`, you can see how it will now match twice. — Jeff Gruenbaum, Aug 26 '22 at 14:54
@WiktorStribiżew thank you for your response. But it returns [('White cat sat on the mat and then the cat ran', 'cat', 'ran')] — nr spider, Aug 26 '22 at 14:54
@JeffGruenbaum I have edited the expected answer (removed 'white' from the results). My concern is about extracting overlapping results, so there can be two results — nr spider, Aug 26 '22 at 14:59
Good, so the best you can do is `detail_re = r'\b(({0})\b.*?\b({1}))\b'.format(starting_word, ending_word)`. You cannot put disjoint texts into one group. — Wiktor Stribiżew, Aug 26 '22 at 14:59
This works: `import re` `description = 'White cat sat on the mat and then the cat ran away'` `detail_re = r'(?=(cat.*?ran))'` `matches = re.finditer(detail_re, description)` `extracted_text_list = [match.group(1) for match in matches]` `print(extracted_text_list)` — Jeff Gruenbaum, Aug 26 '22 at 15:05
Use look ahead assertion. `detail_re = r'(?=({0}.*?{1}))'.format(starting_word, ending_word)` . The rest of the code is fine. — mrin9san, Aug 26 '22 at 15:09
Correct answer has been provided by @mrin9san. There are two ways to do that. Method 1: Using re detail_re = r'(?=({0}.*?{1}))'.format(starting_word, ending_word) extracted_text_list = re.findall(detail_re, description,re.IGNORECASE) Method 2: Using regex detail_re = regex.compile(r'{0}.*?{1}'.format(starting_word, ending_word)) extracted_text_list = detail_re.findall(description,overlapped = True) Method 2 cannot have re.IGNORECASE — nr spider, Aug 26 '22 at 15:19

findall() returns only one result

0 Answers0