I'm trying to learn how to do web scraping, and it's not coming out in the format i would hope it would have. Here is the issue I'm running into:
import urllib
import re
pagelist = ["page=1","page=2","page=3","page=4","page=5","page=6","page=7","page=8","page=9","page=10"]
ziplocations = ["=30008","=30009"]
i=0
while i<len(pagelist):
url = "http://www.boostmobile.com/stores/?" +pagelist[i]+"&zipcode=30008"
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<h2 style="float:left;">(.+?)</h2>'
pattern = re.compile(regex)
storeName = re.findall(pattern,htmltext)
print "Store Name=", storeName[i]
i+=1
This code produces this result: Store Name = Boost Mobile store by wireless depot Store Name = Wal-Mart ..... and so for 10 different stores, I'm assuming this happens because
while i<len(pagelist):
is only = to ten
so it only prints out ten of the stores instead of all stores listed on all pages.
When I change the second to last line to this
print storeName
It will print out every store name listed on each page but not in the format above but like this: 'Boost mobile store by wireless depot', 'boost mobile store by kob wireless', 'marietta check chashing services',..... and so on for about another 120 entries. so how do I get it in the desired format of: "Store Name = ...." rather then: 'name','name',.....