I was practicing writing scraper-script this weekend. My method was to adapt a scraper that I've worked with before, from scraping for stock "price" to scraping for an attribute: the colors used in a website. I've researched libraries and tools, such as lxml and beautiful soup and attempted some debugging, but I can't quite figure it.
Goal: return a list of all of the colors used on a website
This is what I wrote:
import urllib
import re
url="https://cloud.google.com/edu"
htmlfile = urllib.urlopen(url)
htmlsource = htmlfile.read()
regex = '<color:#aaa>'
pattern = re.compile(regex)
color = re.findall(pattern, htmlsource)
print "color", color
What I keep getting in return is: color