399

I have something like this:

extensionsToCheck = ['.pdf', '.doc', '.xls']

for extension in extensionsToCheck:
    if extension in url_string:
        print(url_string)

I am wondering what would be the more elegant way to do this in Python (without using the for loop)? I was thinking of something like this (like from C/C++), but it didn't work:

if ('.pdf' or '.doc' or '.xls') in url_string:
    print(url_string)

Edit: I'm kinda forced to explain how this is different to the question below which is marked as potential duplicate (so it doesn't get closed I guess).

The difference is, I wanted to check if a string is part of some list of strings whereas the other question is checking whether a string from a list of strings is a substring of another string. Similar, but not quite the same and semantics matter when you're looking for an answer online IMHO. These two questions are actually looking to solve the opposite problem of one another. The solution for both turns out to be the same though.

tkit
  • 8,082
  • 6
  • 40
  • 71
  • 7
    Possible duplicate of [Check if multiple strings exist in another string](http://stackoverflow.com/questions/3389574/check-if-multiple-strings-exist-in-another-string) – GingerPlusPlus Feb 28 '16 at 15:58
  • I'm not sure what you mean by your last paragraph. You *do* want to check if one string from a list of strings (the file extensions) is a substring of another string (the url). – mkrieger1 Jul 03 '21 at 17:37

8 Answers8

717

Use a generator together with any, which short-circuits on the first True:

if any(ext in url_string for ext in extensionsToCheck):
    print(url_string)

EDIT: I see this answer has been accepted by OP. Though my solution may be "good enough" solution to his particular problem, and is a good general way to check if any strings in a list are found in another string, keep in mind that this is all that this solution does. It does not care WHERE the string is found e.g. in the ending of the string. If this is important, as is often the case with urls, you should look to the answer of @Wladimir Palant, or you risk getting false positives.

Nam G VU
  • 33,193
  • 69
  • 233
  • 372
Lauritz V. Thaulow
  • 49,139
  • 12
  • 73
  • 92
  • 5
    this was exactly what I was looking for. in my case it does not matter where in the string is the extension. thanks – tkit Jun 30 '11 at 12:15
  • 1
    Great suggestion. Using this example, this is how I check if any of the arguments matche the well known help flags: any([x.lower() in ['-?','-h','--help', '/h'] for x in sys.argv[1:]]) – AXE Labs May 29 '14 at 20:04
  • @AXE-Labs using list comprehensions inside `any` will negate some of the possible gains that short circuiting provides, because the whole list will have to be built in every case. If you use the expression without square brackets (`any(x.lower() in ['-?','-h','--help', '/h'] for x in sys.argv[1:])`), the `x.lower() in [...]` part will only be evaluated until a True value is found. – Lauritz V. Thaulow May 31 '14 at 23:47
  • 8
    And if I want to know what ext is when any() returns True? – Peter Jun 16 '15 at 10:12
  • @PeterSenna: `any()` will only return _true_ or _false_, but see @psun 's list comprehension answer below with this modification: `print [extension for extension in extensionsToCheck if(extension in url_string)]` – Dannid Nov 08 '16 at 17:37
  • Given the accepted answer, how do I also print the variable "ext"? I have tried: ``` url_string = "testing.doc.aee" extensionsToCheck = ['.pdf', '.doc', '.xls'] if any(ext in url_string for ext in extensionsToCheck): print(f"{ext} - {url_string}") ``` but no success. – Bruno Ambrozio Mar 09 '20 at 13:49
  • @LauritzV.Thaulow can we get the element ? – Amal Thachappilly Sep 11 '20 at 12:59
  • Thanks. What to do if I want to check from a list and ignore the case. (with both caps and small) – Vraj Kotwala Sep 10 '21 at 16:27
79
extensionsToCheck = ('.pdf', '.doc', '.xls')

'test.doc'.endswith(extensionsToCheck)   # returns True

'test.jpg'.endswith(extensionsToCheck)   # returns False
eumiro
  • 207,213
  • 34
  • 299
  • 261
  • 8
    this one is clever - I didn't know tuples could do that!, but it only works when your substring is anchored to one end of the string. – Dannid Nov 08 '16 at 17:38
  • 9
    Way cool. I just wish there was something like "contains" rather than just startswith or endswith – BrDaHa Feb 09 '17 at 23:34
  • @BrDaHa you can use 'in' for contains . if 'string' in list: – Shekhar Samanta May 03 '18 at 14:53
  • 7
    @ShekharSamanta sure, but that doesn’t solve the problem of checking if one of multiple things is in a string, which is that the original question was about. – BrDaHa May 03 '18 at 19:14
  • 1
    Yes in that case we can use : if any(element in string.split('any delmiter') for element in list) & for string if any(element in string for element in list) – Shekhar Samanta May 04 '18 at 09:49
29

It is better to parse the URL properly - this way you can handle http://.../file.doc?foo and http://.../foo.doc/file.exe correctly.

from urlparse import urlparse
import os
path = urlparse(url_string).path
ext = os.path.splitext(path)[1]
if ext in extensionsToCheck:
  print(url_string)
Wladimir Palant
  • 56,865
  • 12
  • 98
  • 126
9

Use list comprehensions if you want a single line solution. The following code returns a list containing the url_string when it has the extensions .doc, .pdf and .xls or returns empty list when it doesn't contain the extension.

print [url_string for extension in extensionsToCheck if(extension in url_string)]

NOTE: This is only to check if it contains or not and is not useful when one wants to extract the exact word matching the extensions.

psun
  • 615
  • 10
  • 13
  • This is more readable than `any` solution, it's one of the best possible solutions for that question in my opinion. – Dmitry Verhoturov Sep 08 '16 at 18:23
  • 1
    This one is superior to the `any()` solution in my opinion because it can be altered to return the specific matching value as well, like so: `print [extension for extension in extensionsToCheck if(extension in url_string)]` (see my answer for additional details and how to extract the matching _word_ as well as the pattern from the url_string) – Dannid Nov 08 '16 at 18:04
9

Just in case if anyone will face this task again, here is another solution:

extensionsToCheck = ['.pdf', '.doc', '.xls']
url_string = 'file.doc'
res = [ele for ele in extensionsToCheck if(ele in url_string)]
print(bool(res))
> True
Aidos
  • 729
  • 7
  • 20
5

This is a variant of the list comprehension answer given by @psun.

By switching the output value, you can actually extract the matching pattern from the list comprehension (something not possible with the any() approach by @Lauritz-v-Thaulow)

extensionsToCheck = ['.pdf', '.doc', '.xls']
url_string = 'http://.../foo.doc'

print([extension for extension in extensionsToCheck if(extension in url_string)])

['.doc']`

You can furthermore insert a regular expression if you want to collect additional information once the matched pattern is known (this could be useful when the list of allowed patterns is too long to write into a single regex pattern)

print([re.search(r'(\w+)'+extension, url_string).group(0) for extension in extensionsToCheck if(extension in url_string)])

['foo.doc']

Dannid
  • 1,507
  • 1
  • 20
  • 17
  • Hi @Dannid. When I tried your solution I get a syntax error pointing to the "for". Maybe there has been an update in python that requires some different syntax here since your post? Hope you can help me. Thanks – user2382321 Jun 20 '22 at 04:11
  • @user2382321 yeah, I wrote that in python2. python3 requires parentheses for print statements. I updated the code in my example. – Dannid Aug 25 '22 at 17:54
4

Check if it matches this regex:

'(\.pdf$|\.doc$|\.xls$)'

Note: if you extensions are not at the end of the url, remove the $ characters, but it does weaken it slightly

1

This is the easiest way I could imagine :)

list_ = ('.doc', '.txt', '.pdf')
string = 'file.txt'
func = lambda list_, string: any(filter(lambda x: x in string, list_))
func(list_, string)

# Output: True

Also, if someone needs to save elements that are in a string, they can use something like this:

list_ = ('.doc', '.txt', '.pdf')
string = 'file.txt'
func = lambda list_, string: tuple(filter(lambda x: x in string, list_))
func(list_, string)

# Output: '.txt'
Levinson
  • 131
  • 1
  • 4