0

am trying to write an exception while parsing a link:

import requests
from bs4 import BeautifulSoup

IDFile = open('IDs.csv')
IDReader = csv.reader(IDFile)
ID = list(IDReader)
for row in ID:
    col1,col2 = row
    ID ="%s" % (col2)

    url = requests.get("http://.......")
    soup = BeautifulSoup(url.text, "lxml")
    print(soup)
    ## execute more code if "results:" is greater than zero

The output of "print(soup)" is:

<html><body><p>{ success:true ,results:0,rows:[], ID:5432}</p></body></html>

The IDs.csv contains:

14-Aug-2015,5431
30-May-2015,5432
17-Feb-2015,5433

I want to write an exception where:

if output of "print(soup) has "results:0", then APPEND the variable ID (5432 in this case) to zero-results.txt file and process the next id (5433) from my ids.csv file.

else if output of "print(soup) has "result:1" or greater, then continue to process the remaining code

Please help, thanx

zs_python
  • 33
  • 6
  • There are some problems with your question. What does it have to do with exceptions? Which part do you need help with? Are you wondering how to extract `results:0` and `ID:5432`? Do you want to know how to append to a file? These are separate issues that belong in separate questions. – Alex Hall Oct 09 '15 at 13:21
  • You need to parse JSON, analyse it, act accordingly, well just write the program. – Andrey Oct 09 '15 at 13:24
  • @alex-hall, am a noob so i may be technically wrong using the term 'exception'. But yes, i do want to know how to extract the value of "results" – zs_python Oct 09 '15 at 13:48
  • @Andrey Ironically that's valid JS syntax for an object but it's not JSON because there aren't quotes. Some searching tells me that [demjson](http://deron.meranda.us/python/demjson/), [RSON](https://code.google.com/p/rson/) and YAML may be suitable for parsing this. However if their use case is simple enough regexes will suffice. – Alex Hall Oct 09 '15 at 13:58
  • @andrey am just learning to parse, so any help to parse the value of "results:" will really help learn. thank you – zs_python Oct 09 '15 at 13:58
  • @zs_python this looks like JSON. 1) Extract contents of that

    element 2) use python JSON module to parse it 3) extract results

    – Andrey Oct 09 '15 at 14:04
  • If you are getting a parse error exception as the title suggests, then please post the entire exception message. – Jim K Oct 09 '15 at 14:41
  • @Andrey again, that is NOT JSON. JSON requires that the keys be enclosed in double quotes. `json.loads("{ success:true ,results:0,rows:[], ID:5432}")` fails. – Alex Hall Oct 09 '15 at 15:01
  • @AlexHall it is json in practice, just not well formed, you can parse it this away http://stackoverflow.com/questions/1931454/how-to-parse-somewhat-wrong-json-with-python – Andrey Oct 09 '15 at 15:37
  • Yes, I saw that question after some searching which led me to suggest YAML an hour ago. It doesn't make sense to say " it is json in practice, just not well formed". It simply isn't JSON. In particular I was pointing out that your suggestion "2) use python JSON module to parse it" wouldn't work. – Alex Hall Oct 09 '15 at 15:50

1 Answers1

0

Here is some code to grab the result number:

import re
content = str(soup)
matchObj = re.search("results:(\\d)", content)
resultNum = int(matchObj.group(1))
if resultNum > 0:
    # do stuff
    pass
else:
    # do stuff
    pass

To grab the ID, again use the re module, or use one of the soup methods.

I think your question is asking about if statements. Exceptions are part of error handling, which is a different topic.

Jim K
  • 12,824
  • 2
  • 22
  • 51