How to keep the duplicated matched patterns from re.search() multiple strings?

Question

This question below is different from Regular Expression for duplicate words. The linked question solves finding and returning duplicate matched patterns from a single string, such as "the the ..." or "my question, my answer" which return ['the', 'the'] or ['my', 'my']. My problem is when I use "re.search()" in a for loop and hoping it can return multiple duplicates from multiple strings. My regex search pattern is a number followed by 4 different characters from a-z in small cap. Ex. 2inch. See the code below for what I have tried so far:

import re
names = ['1_abc-2inch-scatter', '1_abc-2inch-uniform', '2_abc-3inch-scatter', '2_abc-3inch-uniform']
sizes = []
for name in names:
    sizes = re.search('\d[A-Za-z]+', name).group()
    print(sizes)

Output:

['3inch']

Expected output:

['2inch', '2inch', '3inch', '3inch']

I'm not sure why the code seems to overwrite duplicated search results and only returned the last one matched. I want the expected output to be returned. How do I do this?

score 2 · Answer 1 · answered Aug 07 '23 at 19:00

you are overwriting the sizes variable in each iteration of the loop, try this:

import re

names = ['1abc-2inch-scatter', '1abc-2inch-uniform', '2abc-3inch-scatter', '2abc-3inch-uniform']
sizes = []

for name in names:
    match = re.search(r'\d[A-Za-z]+', name)
    if match:
        sizes.append(match.group())

print(sizes)

The fourth bird · Answer 2 · 2023-08-07T21:07:30.607

2

As an alternative, you could make use of re.findall without looping:

print(re.findall(r'\d[A-Za-z]+', ' '.join(names)))

Or you can concatenate the resulting list from re.findall with sizes[] for every iteration:

for name in names:
    sizes += re.findall(r'\d[A-Za-z]+', name)

Output

['2inch', '2inch', '3inch', '3inch']

edited Aug 07 '23 at 21:07

answered Aug 07 '23 at 19:53

The fourth bird

154,723
16
55
70

score 1 · Answer 3 · answered Aug 07 '23 at 19:03

I'm not a Python/Re expert, but is this what you want?

import re


def get_size(name):
    size = []
    for i in name:
        size.append(re.findall(r'\d+inch', i)[0])
    return size


print(get_size(names))

Prints:

['2inch', '2inch', '3inch', '3inch']

OneMadGypsy · Answer 4 · 2023-08-07T19:32:11.683

For your needs, you can join the data and use re.finditer to concoct your sizes list. This method will only compile the regex one time, and turns the "working" part of your code into one simple line.

import re

names    = ['1_abc-2inch-scatter', '1_abc-2inch-uniform', '2_abc-3inch-scatter', '2_abc-3inch-uniform']

finditer = re.compile(r'\d[a-z]+', re.I).finditer

sizes    = [m.group() for m in finditer(' '.join(names))]

print(sizes) #['2inch', '2inch', '3inch', '3inch']

score 0 · Answer 5 · answered Aug 07 '23 at 20:39

"... My problem is when I use "re.search()" in a for loop and hoping it can return multiple duplicates from multiple strings. ..."

The re.search method is going to return only the first match.

There is the re.findall method, which will return all occurrences as a list or tuple.
And, there is the re.finditer method, which will return all occurrences as an iterator.

Furthermore, in your code, you should be using list.append.

import re
names = ['1_abc-2inch-scatter', '1_abc-2inch-uniform', '2_abc-3inch-scatter', '2_abc-3inch-uniform']
sizes = []
for name in names:
    for string in re.findall(r'\d[A-Za-z]+', name):
        sizes.append(string)
print(sizes)

Output

['2inch', '2inch', '3inch', '3inch']

How to keep the duplicated matched patterns from re.search() multiple strings?

5 Answers5