0

I am currently extracting the following from a website using BeautifulSoup. But am struggling to print extract the data I need.

I am looking to extract for each list entry:

The data-qty value and the href="#">4 value. So for example in the first list entry I am trying to extract href = 4 and data-qty = 1.000.

The code I am currently using is listed under the data.

<div class="content size-options size_us-options" data-sizegroup="size_us" style="display:none">
    <ul class="sizes small-block-grid-4">
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="24" data-price="0" data-qty="1.0000" data-qtymad="0.0000" data-qtybcn="1.0000" data-oblocators="BBAI-0B-05-05" href="#">4</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="172" data-price="0" data-qty="4.0000" data-qtymad="0.0000" data-qtybcn="2.0000" data-oblocators="BBAI-0B-05-05" href="#">4.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="22" data-price="0" data-qty="10.0000" data-qtymad="0.0000" data-qtybcn="2.0000" data-oblocators="BBAI-0B-07-05" href="#">5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="160" data-price="0" data-qty="10.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-07-05" href="#">5.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="20" data-price="0" data-qty="9.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">6</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="165" data-price="0" data-qty="11.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">6.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="18" data-price="0" data-qty="28.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">7</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="110" data-price="0" data-qty="41.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">7.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="16" data-price="0" data-qty="53.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">8</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="121" data-price="0" data-qty="68.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-06-02;BBAI-0B-05-05" href="#">8.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="14" data-price="0" data-qty="85.0000" data-qtymad="0.0000" data-qtybcn="4.0000" data-oblocators="BBAI-0B-07-05" href="#">9</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="114" data-price="0" data-qty="64.0000" data-qtymad="0.0000" data-qtybcn="4.0000" data-oblocators="BBAI-0B-07-05" href="#">9.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="12" data-price="0" data-qty="71.0000" data-qtymad="0.0000" data-qtybcn="4.0000" data-oblocators="BBAI-0B-07-05" href="#">10</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="105" data-price="0" data-qty="59.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-07-05" href="#">10.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="10" data-price="0" data-qty="61.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-07-05" href="#">11</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="117" data-price="0" data-qty="39.0000" data-qtymad="0.0000" data-qtybcn="2.0000" data-oblocators="BBAI-0B-07-05" href="#">11.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="8" data-price="0" data-qty="39.0000" data-qtymad="0.0000" data-qtybcn="2.0000" data-oblocators="BBAI-0B-07-05" href="#">12</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="202" data-price="0" data-qty="25.0000" data-qtymad="0.0000" data-qtybcn="0.0000" data-oblocators="" href="#">12.5</a>
        </li>
        <li>
            <a rel="nofollow" class="size-button available" data-optionIndex="126" data-price="0" data-qty="26.0000" data-qtymad="0.0000" data-qtybcn="0.0000" data-oblocators="" href="#">13</a>
        </li>
    </ul>
</div>

This is the code that I am currently using, I am struggling to extract and print the data I need and will be thankful for any help!

 soup = BeautifulSoup(response.content, 'html.parser')
 ukattributes = soup.find('div', {'class':'content size-options 
 size_uk-options'})
 print ukattributes
 sizes = ukattributes.findAll('li')
 print sizes
     for size in sizes:
     response = s.get(size.find('a')['href'])
     soup = BeautifulSoup(response.content, 'html.parser')
     print size

Please let me know if you can help me with this as I have been trying for a while now! Thanks again

Dan-Dev
  • 8,957
  • 3
  • 38
  • 55
Larsson
  • 39
  • 9

2 Answers2

1

You can use a simple list comprehension to select the data you need.

ukattributes = soup.find('div', {'class':'content size-options size_us-options'})
data = [ [a.text, a.get('data-qty')] for a in ukattributes.find_all('a') ]
t.m.adam
  • 15,106
  • 3
  • 32
  • 52
  • Thank you so much this is really helpful in my learning for python! – Larsson Aug 11 '17 at 21:41
  • I'm glad to hear that man. List comprehensions is a very handy feature. – t.m.adam Aug 11 '17 at 21:48
  • do you know any good reads about this so that I can get more familiar? – Larsson Aug 11 '17 at 22:12
  • There are many tutorials and e-books out there, but i think practice is better. You can start with someting simple like `[i for i in range(10)]` , then move to more complex structures. – t.m.adam Aug 11 '17 at 22:24
1

You cant make a GET request on a URL # as this is not sent to the server it is probably used by JavaScript on the page or just links to the same page. See my answer to Pagination giving the first page in every iteration for more details. So:

response = s.get(size.find('a')['href'])

Will not work as you expected. To get the data you requested try:

soup = BeautifulSoup(response.content, 'html.parser')
ukattributes = soup.find('div', {'class':'content size-options size_us-options'})
print (ukattributes)
sizes = ukattributes.findAll('li')
print (sizes)
for size in sizes:
    href = size.find('a',href=True)
    print (href.text)
    print (href["data-qty"])

Outputs:

4
1.0000
4.5
4.0000
5
10.0000
5.5
10.0000
Dan-Dev
  • 8,957
  • 3
  • 38
  • 55