I am attempting to gather some data on the cost of living index for some towns in USA/Texas getting it from the website below: http://www.city-data.com/city/Texas.html
Approach: for the sake of repeatingly extract links out of the targetpage i use the function below:
from bs4 import BeautifulSoup
import requests
import re
def getLinks(url):
r = requests.get("http://www.city-data.com/city/Texas.html")
soup = BeautifulSoup(r.content)
links = []
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
links.append(link.get('href'))
##It will scrape all the a tags, and for each a tags, it will append the href attribute to the links list.
return links
print( getLinks("http://www.city-data.com/city/Texas.html") )
dataset: http://www.city-data.com/city/Texas.html that contains the following pages that hold information about the towns with inhabitants:
Abilene, TX 120,958
Abram-Perezville 6,663
Addison, TX 15,457
Alamo Heights 7,806
Alamo, TX 19,224
Aldine 15,869
Alice, TX 19,395
Allen, TX 94,179
Alton North 6,182
note: what is aimed to gather the data out of the sub-pages: therefore i need a parser that loops through the subpages - eg like the following:
http://www.city-data.com/city/Abilene-Texas.html http://www.city-data.com/city/Abram-Perezville-Texas.html http://www.city-data.com/city/Addison-Texas.html http://www.city-data.com/city/Alamo-Heights-Texas.html
and so forth - but at the moment i get back
ModuleNotFoundError: No module named 'BeautifulSoup'
PS: in the first attemt i used urllib2 - but this is python2 - so i changed it to urllib3 but i am not sure if this is correct - and if i have this module running in my Anaconda. This is pretty important. By the way: what about the following term: urllib2.urlopen - that seems to be outdated too!? I need to re-write this also. What do you think!? Look forward to hear from you! At the moment i am a bit confused about the urllib.urlopen-term!?
update: thanks to the hint of Andrej and Guilherme, i saw that i have the following setup in the packages:
so i need to recode the plugins that i import. Many thanks for the hint!