filehandle = urllib.urlopen(myurl)
Because of the fact that I want to regex
the filehandle
afterwords I need to transform the filehandle from an object to a string.
How can I make the webpage code to be stored in a string?
filehandle = urllib.urlopen(myurl)
Because of the fact that I want to regex
the filehandle
afterwords I need to transform the filehandle from an object to a string.
How can I make the webpage code to be stored in a string?
It's pretty simple:
page = filehandle.read()
You can also iterate over it, like:
lines = []
for line in filehandle:
lines.append(line)
For extracting data, use BeautifulSoup or lxml.
Because urllib.urlopen
returns a file like object, you can either call .read()
on it, or directly iterate over it.
See the docs for more
Edit:
Okay to explain what
directly iterate over it
means.
import urllib
request = urllib.urlopen("http://www.python.org")
for source_line in request:
print source_line