0

Here's the HTML code:

<div class="sizeBlock">
 <div class="size"> 
    <a class="selectSize" id="44526" data-size-original="36.5">36.5</a> 
 </div> 
 <div class="size inactive active"> 
    <a class="selectSize" id="44524" data-size-original="40">40</a> 
 </div> 
 <div class="size "> 
    <a class="selectSize" id="44525" data-size-original="40.5">40.5</a> 
 </div> 
</div>

I want to get the values of the id tag and the data-size-original.

Here's my code:

for sizeBlock in soup.find_all('a', class_="selectSize"):
        aid  = sizeBlock.get('id')
        size = sizeBlock.get('data-size-us')

The problem is that it gets the values of other ids that have the same class "selectSize".

vS12
  • 310
  • 2
  • 8
Florian
  • 61
  • 4

2 Answers2

1

I think this is what you want. You won't have ids and size from data in div class='size inactive active'

for sizeBlock in soup.select('div.size a.selectSize'):
    aid = sizeBlock.get('id')
    size = sizeBlock.get('data-size-us')
Benoit Drogou
  • 969
  • 1
  • 5
  • 15
0

Already answered here How to Beautiful Soup (bs4) match just one, and only one, css class

Use soup.select. Here's a simple test:

from bs4 import BeautifulSoup

html_doc = """<div class="size">
<a class="selectSize otherclass" id="44526" data-ean="0193394075362" " data-tprice="" data-sku="1171177-36.5" data-size-original="36.5">5</a>
</div>"""

soup = BeautifulSoup(html_doc, 'html.parser')

#for sizeBlock in soup.find_all('a', class_= "selectSize"): # this would include the anchor
for sizeBlock in soup.select("a[class='selectSize']"):
    aid = sizeBlock.get('id')
    size = sizeBlock.get('data-size-original')
    print aid, size
lainatnavi
  • 1,453
  • 1
  • 14
  • 22