42

I want to get all the iframe from a webpage.

Code:

site = "http://" + url
f = urllib2.urlopen(site)
web_content =  f.read()

soup = BeautifulSoup(web_content)
info = {}
content = []
for iframe in soup.find_all('iframe'):
    info['src'] = iframe.get('src')
    info['height'] = iframe.get('height')
    info['width'] = iframe.get('width')
    content.append(info)
    print(info)       

pprint(content)

result of print(info):

{'src': u'abc.com', 'width': u'0', 'height': u'0'}
{'src': u'xyz.com', 'width': u'0', 'height': u'0'}
{'src': u'http://www.detik.com', 'width': u'1000', 'height': u'600'}

result of pprint(content):

[{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'},
{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'},
{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'}]

Why is the value of the content not right? It's suppose to be the same as the value when I print(info).

Georgy
  • 12,464
  • 7
  • 65
  • 73
l1th1um
  • 715
  • 2
  • 8
  • 19

4 Answers4

70

You are not creating a separate dictionary for each iframe, you just keep modifying the same dictionary over and over, and you keep adding additional references to that dictionary in your list.

Remember, when you do something like content.append(info), you aren't making a copy of the data, you are simply appending a reference to the data.

You need to create a new dictionary for each iframe.

for iframe in soup.find_all('iframe'):
    info = {}
    ...

Even better, you don't need to create an empty dictionary first. Just create it all at once:

for iframe in soup.find_all('iframe'):
    info = {
        "src": iframe.get('src'),
        "height": iframe.get('height'),
        "width": iframe.get('width'),
    }
    content.append(info)

There are other ways to accomplish this, such as iterating over a list of attributes, or using list or dictionary comprehensions, but it's hard to improve upon the clarity of the above code.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Bryan Oakley
  • 370,779
  • 53
  • 539
  • 685
44

You have misunderstood the Python list object. It is similar to a C pointer-array. It does not actually "copy" the object which you append to it. Instead, it just store a "pointer" to that object.

Try the following code:

>>> d={}
>>> dlist=[]
>>> for i in xrange(0,3):
    d['data']=i
    dlist.append(d)
    print(d)

{'data': 0}
{'data': 1}
{'data': 2}
>>> print(dlist)
[{'data': 2}, {'data': 2}, {'data': 2}]

So why is print(dlist) not the same as print(d)?

The following code shows you the reason:

>>> for i in dlist:
    print "the list item point to object:", id(i)

the list item point to object: 47472232
the list item point to object: 47472232
the list item point to object: 47472232

So you can see all the items in the dlist is actually pointing to the same dict object.

The real answer to this question will be to append the "copy" of the target item, by using d.copy().

>>> dlist=[]
>>> for i in xrange(0,3):
    d['data']=i
    dlist.append(d.copy())
    print(d)

{'data': 0}
{'data': 1}
{'data': 2}
>>> print dlist
[{'data': 0}, {'data': 1}, {'data': 2}]

Try the id() trick, you can see the list items actually point to completely different objects.

>>> for i in dlist:
    print "the list item points to object:", id(i)

the list item points to object: 33861576
the list item points to object: 47472520
the list item points to object: 47458120
fragilewindows
  • 1,394
  • 1
  • 15
  • 26
Wang
  • 7,250
  • 4
  • 35
  • 66
  • 2
    So... you're advocating that the user uses the `.copy()` method instead of just creating a new dictionary on every iteration? I think that's wrong in this specific case. – Bryan Oakley Jul 15 '12 at 14:52
  • In many cases, you may only change part of the item, in the aspect of performance and concision, I prefer `.copy()`. In the aspect of education, `.copy()` also provides a more clear concept. – Wang Jul 15 '12 at 15:04
  • 3
    I think `.copy()` only makes sense when you actually want to copy something. While agree that in some cases yo umay want to change only part of an item, in the case of this particular question I think the OP clearly was intending to create a new dictionary for each iframe, rather than copy-and-modify. – Bryan Oakley Jul 15 '12 at 15:10
  • Well we are completely speculating here ;) No one really sure what the OP wants. It just reminded me my own work, where `.copy()` is more suitable. Why not keep this option open? Even this is not what he wants, education-wise, readers can learn something. – Wang Jul 15 '12 at 15:15
  • One cannot deny though that the OP does not make any optimization on changing only one value if the others stay equal. Creating a new dict is the right way to go here. But there are use-cases for copy of course. – Jonas Schäfer Jul 15 '12 at 15:40
  • 1
    Thanks for your explanation. I learn a lot for you're example. May intent is to make dictionary for every iframe, but your explanation on .copy() is useful for me – l1th1um Jul 15 '12 at 17:11
  • Explained very well, :) – Khumar Jun 10 '23 at 21:59
5

If you want one line:

list_of_dict = [{} for i in range(list_len)]
fragilewindows
  • 1,394
  • 1
  • 15
  • 26
XU Bin
  • 251
  • 3
  • 6
4

info is a pointer to a dictionary - you keep adding the same pointer to your list contact.

Insert info = {} into the loop and it should solve the problem:

...
content = []
for iframe in soup.find_all('iframe'):
    info = {}
    info['src'] = iframe.get('src')
    info['height'] = iframe.get('height')
    info['width'] = iframe.get('width')
...
zenpoy
  • 19,490
  • 9
  • 60
  • 87