Finding a substring within a list in Python

Question

Background:

Example list: mylist = ['abc123', 'def456', 'ghi789']

I want to retrieve an element if there's a match for a substring, like abc

Code:

sub = 'abc'
print any(sub in mystring for mystring in mylist)

above prints True if any of the elements in the list contain the pattern.

I would like to print the element which matches the substring. So if I'm checking 'abc' I only want to print 'abc123' from list.

You probably don't want to name a variable `list`, since that's the name of a built in data type (and you won't be able to do `list(x)` in the future) — David Robinson, Dec 08 '12 at 16:50
Try filter. Should do what you need. Example here http://stackoverflow.com/questions/3640359/regular-expressions-search-in-list — Dan, Dec 08 '12 at 16:52
"it will print True for every element in the list." You are confused; it will print `True` only once, because `any` returns a single boolean value. It means exactly what it says: it returns a boolean that indicates if `any` of the listed things are true. — Karl Knechtel, Dec 08 '12 at 18:50
I'm not confused. That was the output, so I must've made a mistake in using `any` because it printed `True` for every element in the list. — frankV, Dec 08 '12 at 19:51
there are too many incorrect statements in the question description. as @KarlKnechtel noted, any(...) statement would only print `True` once! the first code snippet is syntactically incorrect because `print string` is referring a variable that is out of scope. use of keywords `string` and `list` are anti-convention. I gave up on editing it! the later part needs to be completely re-written! — Zahra, Jun 06 '18 at 17:48

score 191 · Accepted Answer · edited Dec 07 '15 at 18:40

191

print [s for s in list if sub in s]

If you want them separated by newlines:

print "\n".join(s for s in list if sub in s)

Full example, with case insensitivity:

mylist = ['abc123', 'def456', 'ghi789', 'ABC987', 'aBc654']
sub = 'abc'

print "\n".join(s for s in mylist if sub.lower() in s.lower())

edited Dec 07 '15 at 18:40

matt wilkie

17,268
24
80
115

answered Dec 08 '12 at 16:49

David Robinson

77,383
16
167
187

3

I think you should encourage use of generators when possible, such as in your second example :) – ThinkChaos Apr 24 '15 at 15:14
for case insensitivity use `sub.lower() in s` (thank you http://stackoverflow.com/questions/3627784/case-insensitive-in-python) – matt wilkie Dec 03 '15 at 22:39
@mattwilkie Note that to be case insensitive you'd need `sub.lower() in s.lower()`, or it won't work when `s` is not lowercase. – David Robinson Dec 04 '15 at 00:45
oh. yes, thanks for catching that! The function I've been working passed my prototype data but would have failed on the real thing next week. (added example, remove if you don't like it) – matt wilkie Dec 07 '15 at 18:39
mind to explain what is the reasoning behind `s for s in list if sub in s` ? what does it even mean – Toskan Jul 02 '20 at 01:45
I need it the other way around: several subs/needles and one string/haystack. – Timo Nov 30 '20 at 16:44
@Toskan it prints s zero, one or several times if sub is in s. This is list comprehension with filter: `[ EXP for x in seq if COND ]` – Timo Dec 09 '20 at 19:01
thats really weird python syntax, EXP and COND are so far apart... but thanks a lot for the explanation – Toskan Dec 15 '20 at 20:31
Need to point out that using `list` as a variable name is bad practice since it overwrites the builtin `list` function. – Chris Collett May 11 '22 at 15:25

Frank Zalkow · Answer 2 · 2016-10-12T08:10:49.750

84

All the answers work but they always traverse the whole list. If I understand your question, you only need the first match. So you don't have to consider the rest of the list if you found your first match:

mylist = ['abc123', 'def456', 'ghi789']
sub = 'abc'
next((s for s in mylist if sub in s), None) # returns 'abc123'

If the match is at the end of the list or for very small lists, it doesn't make a difference, but consider this example:

import timeit

mylist = ['abc123'] + ['xyz123']*1000
sub = 'abc'

timeit.timeit('[s for s in mylist if sub in s]', setup='from __main__ import mylist, sub', number=100000)
# for me 7.949463844299316 with Python 2.7, 8.568840944994008 with Python 3.4
timeit.timeit('next((s for s in mylist if sub in s), None)', setup='from __main__ import mylist, sub', number=100000) 
# for me 0.12696599960327148 with Python 2.7, 0.09955992100003641 with Python 3.4

edited Oct 12 '16 at 08:10

answered May 12 '15 at 17:33

Frank Zalkow

3,850
1
22
23

2

Really great solution, by the way. I used it in some recent code. – Blairg23 Oct 13 '16 at 00:34
1

That's a good point, thumbs up for you! – Antony Fuentes Dec 08 '17 at 00:29
1

this is an interesting solution but if you put ['abc123'] at the end of your "mylist" your solution will still take a very long time. – Angelo Mar 10 '20 at 12:51
1

@Angelo Agree. The only reason this goes so fast is because it finds the match in the first element. So the time taken for this example is the BEST case. WORST case is it takes just as long (if it's not in the list). Still a good solution. – Chris Collett May 11 '22 at 15:27

score 28 · Answer 3 · edited Dec 06 '17 at 14:11

28

Use a simple for loop:

seq = ['abc123', 'def456', 'ghi789']
sub = 'abc'

for text in seq:
    if sub in text:
        print(text)

yields

abc123

edited Dec 06 '17 at 14:11

Peter Mortensen

30,738
21
105
131

answered Dec 08 '12 at 16:49

unutbu

842,883
184
1,785
1,677

score 14 · Answer 4 · answered Dec 08 '12 at 16:49

14

This prints all elements that contain sub:

for s in filter (lambda x: sub in x, list): print (s)

answered Dec 08 '12 at 16:49

Hyperboreus

31,997
9
47
87

score 13 · Answer 5 · edited Nov 10 '16 at 13:21

13

I'd just use a simple regex, you can do something like this

import re
old_list = ['abc123', 'def456', 'ghi789']
new_list = [x for x in old_list if re.search('abc', x)]
for item in new_list:
    print item

edited Nov 10 '16 at 13:21

sushant-hiray

1,838
2
21
28

answered Dec 08 '12 at 16:56

oathead

452
2
5

5

Why add complexity? The `in` operator is perfect for the job as seen in other responses. Regexes are a great tool, but I think it's a bit overkill here. – ThinkChaos Apr 24 '15 at 15:11
Very nifty piece of code. – dipl0 Jun 21 '18 at 16:19

Finding a substring within a list in Python

Background:

Code:

5 Answers5

Linked

Related