0

I wrote a script to scrape trough multiple urls, add the useful information with help of BeautifulSoup to two arrays (ids and names) and than add the values of these arrays to a MySQL Table where ids[0] and names[0] is row0 of the table and so on...

However my code is very ugly and i am sure there is a way better approaches than mine.

Can anybody give me a hint? I specificly need an input on how to iterate trough the two arrays...

Thanks in advance!

#!/usr/bin/env python
from bs4 import BeautifulSoup
from urllib import urlopen
import MySQLdb 

#MySQL Connection
mysql_opts = {
    'host': "localhost",
    'user': "********",
    'pass': "********",
    'db':   "somedb"
    }
mysql = MySQLdb.connect(mysql_opts['host'], mysql_opts['user'], mysql_opts['pass'], mysql_opts['db']) 

#Add Data SQL Query
data_query = ("INSERT INTO tablename "
               "(id, name) "
               "VALUES (%s, %s)")

#Urls to scrape
url1  = 'http://somepage.com'
url2  = 'http://someotherpage.com'
url3  = 'http://athirdpage.com'

#URL Array
urls = (url1,url2,url3)

#Url loop
for url in urls:
    soupPage = urlopen(url)
    soup = BeautifulSoup (soupPage)

    ids = soup.find_all('a', style="display:block")
    names = soup.find_all('a', style="display:block")

    i = 0
    print ids.count
    while (i < len(ids)):
        try: 
            id = ids[i]
            vid = id['href'].split('=')
            vid = vid[1]
        except IndexError:
            id = "leer"

        try:
            name = names[i]
            name = name.contents[0]
            name = name.encode('iso-8859-1')
        except IndexError:
            name = ""

        data_content = (vid, name)
        cursor.execute(data_query, data_content)
        emp_no = cursor.lastrowid
        i = i + 1
Riscie
  • 3,775
  • 1
  • 24
  • 31
  • I think i found an answer: `for id, name in newlist(ids, names): #do something with it... print(id, name)` Can someone confirm this is the best approach? => http://stackoverflow.com/questions/1663807/how-can-i-iterate-through-two-lists-in-parallel-in-python – Riscie May 21 '13 at 11:22
  • Corrected: `for id, name in zip(ids, names): #do something with it... print(id, name)` – Riscie May 21 '13 at 11:30

1 Answers1

0

My comment seems to be the answer. Just tested it:

for vid, name in zip(ids, names):
        vid = vid['href'].split('=')
        vid = vid[1]

        name = name.contents[0]
        name = name.encode('iso-8859-1')

        data_content = (vid, name)
        cursor.execute(data_query, data_content)
        emp_no = cursor.lastrowid

For a more common form see: How can I iterate through two lists in parallel?

Sorry for duplicate. If anybody can add something to the answer, feel free.

Community
  • 1
  • 1
Riscie
  • 3,775
  • 1
  • 24
  • 31