0

web1, web2 and web3 is three different websites I am scraping from. I want to see if they contain the names of company1, company2 or company3. This worked perfectly before but is not working at the moment. Print(matches) returns all the companies like this ['company1', 'company2', 'company3']. When I print web1, web2 and web3 I am able to see that they do not contain any of the names of the companies. This is the code:

companies = [ 'company1', 'company2', 'company3' ]

matches = [ x for x in companies if x in web1 or web2 or web3 ]

print(matches)

When I change matches to only one source at the time, it shows no found names of companies. The problem appears when I try to search for the names of the companies in multiple sources at the same time. All help is appreciated.

dejanualex
  • 3,872
  • 6
  • 22
  • 37
Lohant00
  • 1
  • 3
  • 1
    What is `x in web1 or web2 or web3` supposed to mean? – bereal Oct 15 '19 at 08:00
  • As @bereal hints at, your condition is not testing what you think it’s testing ... it tests if `x in web1` or `web2` is non-empty or `web3` is non-empty – donkopotamus Oct 15 '19 at 08:03

2 Answers2

2

Your syntax is slightly off, but unfortunately it still is valid Python. The logic you intended to use was this:

matches = [ x for x in companies if x in web1 or x in web2 or x in web3 ]

That is, you want to check if a given company be in any of the three website addresses. As you wrote it, all companies will always pass, because web2 and web3 will always evaluate to true.

Try this corrected version:

companies = [ 'company1', 'company2', 'company3' ]

web1 = 'http://www.company1.com'
web2 = 'http://www.company1.com'
web3 = 'http://www.company1.com'
matches = [ x for x in companies if x in web1 or x in web2 or x in web3 ]

print(matches)

This prints only:

['company1']

Note that I intentionally set all three website addresses to match only the first company, to show that only this company gets printed.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

Your boolean expression x in web1 or web2 or web3 will always evaluate to true, because web2 evaluates to true. You could concatenate all the websites and check if x is in that:

matches = [ x for x in companies if x in web1 + web2 + web3 ]
Ollie
  • 1,641
  • 1
  • 13
  • 31