0

Suppose I have two lists like:

list_of_urls = ['https://en.wikipedia.org/wiki/Barack_Obama', 'https://en.wikipedia.org/wiki/President_of_the_United_States', 'google.com']

list_of_blacklisted_urls = ['wikipedia']

How to return True if any part of the blacklisted url is in the list_of_urls? I have tried:

for url in list_of_urls:
        if any(URL in URLs for URL in list_of_blacklisted_urls):
                return True

But I'm quite sure this doesn't work.

Jack
  • 394
  • 1
  • 15
  • 2
    Quite sure? Please provide both input, output, and expected output to obtain a [MCVE](https://stackoverflow.com/help/minimal-reproducible-example). – Tim Dec 04 '19 at 09:23
  • Consider using a specialist algorithm such as Aho-Corasick, as described in [this answer](https://stackoverflow.com/a/48600345/9209546). – jpp Dec 04 '19 at 09:24
  • `[print(url) for url in list_of_urls for blacklisted in list_of_blacklisted_urls if blacklisted in url]` – Hadi Farah Dec 04 '19 at 09:26
  • 1
    Duplicate: [Check if a Python list item contains a string inside another list](https://stackoverflow.com/questions/32290949/check-if-a-python-list-item-contains-a-string-inside-another-list) – Georgy Dec 04 '19 at 10:11

6 Answers6

4

You're pretty close... But the any function doesn't work the way you seem to think it does. You have to use a nested loop instead.

Here's an example:

list_of_urls = ['https://en.wikipedia.org/wiki/Barack_Obama', 'https://en.wikipedia.org/wiki/President_of_the_United_States', 'google.com']

list_of_blacklisted_urls = ['wikipedia']

for url in list_of_urls:
    for keyword in list_of_blacklisted_urls:
        if keyword in url:
            print("FOUND", keyword, "in", url)
jknotek
  • 1,778
  • 2
  • 15
  • 23
4
data = pd.DataFrame(list_of_urls)
data  = data[data[0].str.contains(*list_of_blacklisted_urls)]

then you can see the result checking data.

Rafael Ferreira
  • 329
  • 2
  • 16
1

How about this:

def in_black_urls():
    for black_url in list_of_blacklisted_urls :
        if black_url in list_of_urls:
            return True
    return False
Tom Chen
  • 235
  • 2
  • 8
1

Just one line, keep it simple:

len([x for x in list_of_urls if any(y in x for y in list_of_blacklisted_urls)]) > 0
Itachi
  • 5,777
  • 2
  • 37
  • 69
1

You can use nested loop and 'in':

list_of_urls = ['https://en.wikipedia.org/wiki/Barack_Obama', 'https://en.wikipedia.org/wiki/President_of_the_United_States', 'google.com']
list_of_blacklisted_urls = ['wikipedia']

def checker(urls,blacklist):
    for url in urls:
        for URL in blacklist:
            if URL in url:
                print(True, url, URL)
            else:
                return False
checker(list_of_urls,list_of_blacklisted_urls)
timanix
  • 94
  • 1
  • 3
0

Using nested list comprehension:

def blacklisted(all_urls, blacklist):
  if len([word for url in all_urls for word in blacklist if word in url]) > 0:
    return True
zegoat7
  • 457
  • 1
  • 6
  • 14