-2

I have to write a program that scrapes the website “LivingSocial” for all the details of their deals and promo and store it in MySQLdatabase.
URL : "http://www.livingsocial.com/cities/15-san-francisco"
So far I have been able to write this code :

from lxml import html
import requests
import MySQLdb

# connect
db = MySQLdb.connect(host="localhost", user="root", passwd="",
db="scrapy")

x = db.cursor()


page = requests.get('https://www.livingsocial.com/cities/15-san-francisco')
tree = html.fromstring(page.content)

#This will create a list of buyers:
descrip = tree.xpath('//p[@class="description"]/text()')
loc = tree.xpath('//p[@class="location"]/text()')


try:
   x.execute("""INSERT INTO scrapy VALUES (%s,%s)""",(descrip,loc))
   db.commit()
except:
   db.rollback()

db.close()

I have created a MySQL database using xampp server.
But this code doesn't seem to run as expected. Please help!

Suraj P Patil
  • 76
  • 1
  • 8
  • *doesn't seem to run as expected* Exactly what were you expecting what did you actually observe? What have your debugging attempts revealed? – lurker Jul 02 '16 at 12:07

1 Answers1

0

The first problem is that you are extracting descriptions and locations independently which would cause problems since not every deal has a location. Instead, you need to iterate over all "deals" and for each deal get the description and location:

deals = [
    {'description': deal.findtext('.//p[@class="description"]'),
     'location': deal.findtext('.//p[@class="location"]')}
    for deal in tree.xpath("//li[@dealid]")
]

Here deals would become a list of dictionaries. If no location is provided, it would be None (which then would become NULL in MySQL).

The next problem is to insert deals into the database. For it we would use executemany():

x.executemany("""
    INSERT INTO 
        scrapy 
    VALUES 
        (%(description)s, %(location)s)
""", deals)
db.commit()

As a side note, you should not be using the bare except clause and catch more specific exceptions instead. Also, you are silently catching an error and rolling back which prevents you from knowing what happened, what error was raised. Log your errors:

try:
    # ...
except SomeMeaningfulException as e:
    print(e) # TODO: properly log the exception
    db.rollback()
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195