-1

filehandle = urllib.urlopen(myurl)

Because of the fact that I want to regex the filehandle afterwords I need to transform the filehandle from an object to a string. How can I make the webpage code to be stored in a string?

george mano
  • 5,948
  • 6
  • 33
  • 43

2 Answers2

3

It's pretty simple:

page = filehandle.read()

You can also iterate over it, like:

lines = []
for line in filehandle:
    lines.append(line)

For extracting data, use BeautifulSoup or lxml.

Anuj Gupta
  • 10,056
  • 3
  • 28
  • 32
3

Because urllib.urlopen returns a file like object, you can either call .read() on it, or directly iterate over it.

See the docs for more

Edit:

Okay to explain what

directly iterate over it

means.

import urllib
request = urllib.urlopen("http://www.python.org")
for source_line in request:
    print source_line
Jakob Bowyer
  • 33,878
  • 8
  • 76
  • 91