0

i'm learning BeautifulSoup and i encountered this

from bs4 import BeautifulSoup
import urllib2

url = "https://en.wikipedia.org/wiki/Katy_Perry"
open_url = urllib2.urlopen(url)
read = open_url.read()
print(read)

This prints the html code of the page. But how can we use read() here ? Its a FileIO function and should be used along with the file object. but the variable "open_url" here isn't a file object.

print(type(open_url))

output:

<type 'instance'>

Obviously "open_url" isn't a file object, So what made it possible to bind read() to "open_url" ?

Uchiha Madara
  • 984
  • 5
  • 16
  • 35
  • Possible duplicate of [What is the difference between old style and new style classes in Python?](http://stackoverflow.com/questions/54867/what-is-the-difference-between-old-style-and-new-style-classes-in-python) – Łukasz Rogalski Jun 24 '16 at 08:18
  • `open` is an instance of an object, meaning it can have nearly everything binded to it (attributes and methods). Note: Rename your variable for something else as `open()` is a builtin function. – Cyrbil Jun 24 '16 at 08:18

1 Answers1

0

If you print both open_url you will see that fp = socket._fileobject ..

<addinfourl at 139707791457312 whose fp = <socket._fileobject object at 0x7f104303bd50>>

So you see the file object is actually a socket._fileobject which you can access with open_url.fp:

<socket._fileobject object at 0x7f104303bd50>

If you remove the first read call you will see that you can access the socket object and call .read on that directly, that is what happens when you call open_url.read() etc..:

open_url.fp.read()
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321