0

I am looking for a value "Not found" but my code is not finding it. Instead if the value is Not Found it just crashes

here is the code

for key, value in productLinks.iteritems():
        if value is "Not Found":
                 print value
        else:
                 print value
                 html = urllib2.urlopen(value)
                 soup = BS(html)
                 foundPrice = soup.find('s')
                 if found is not None:
                        print "found a price"
                 else:
                        print" No Lunk"

here is the error

Traceback (most recent call last):
  File "asimsScrapper.py", line 28, in <module>
    html = urllib2.urlopen(value)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 386, in open
    protocol = req.get_type()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 248, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: Not found
Asim Zaidi
  • 27,016
  • 49
  • 132
  • 221

3 Answers3

4

is keyword is used for identity comparison, type checking.

You probably wanted to use ==:

if value == "Not Found"

Also, instead of checking for a Not Found (or Not found) string, you can validate the url, like this:

for key, value in productLinks.iteritems():
    if value.startswith('http'):
         print value
         html = urllib2.urlopen(value)
         ...

Or, even better use urlparse to validate the url.

See also:

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Okay, `is` is not used for type checking in python, it (essentially) checks memory address. See http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers or one of the other thousands of references on this material. – a p Sep 13 '13 at 08:09
  • @ap ok, thank you, what is important is that the OP shouldn't use it here. – alecxe Sep 13 '13 at 08:12
  • It also is immaterial to the actual issue at hand, apparently. He seems to be trying to use the string "Not found" as a url and pass that back up through libraries to his glue code. Python doesn't have a built in "not found" monad, so that's not going to work... – a p Sep 13 '13 at 08:17
  • Jesus everyone is upvoting this because it addresses what appears to be an obvious problem -- READ the question before responding. His trace should give some evidence that this is not the problem at hand. Open ipy and type `__import__('urllib2').urlopen('not found')` and I promise you'll get the same error. Figure it out SOirclejerkers. – a p Sep 13 '13 at 08:22
1

The error:

html = urllib2.urlopen(value),
unknown url type: Not found

You are trying to open a url = "Not found"

the root cause is: value is "Not found", please use: value == "Not found".

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Rong Zhao
  • 318
  • 1
  • 16
0

alecxe is right in that you should use == to check value equivalence, but his reasons are wrong. When in doubt, always test equivalence with == -- is simply tests for identity, which is different from equivalence. Identity has more to do with where something is stored in memory than what that value in memory is. There are a number of places to read about this on SO and elsewhere, but the takeaway is that is is not ==.

Your error seems to be unrelated to this. If you try to urllib2.urlopen("not found") it's definitely going to give you exactly this error. You want to try to catch it before passing it to urllib2.

a p
  • 3,098
  • 2
  • 24
  • 46