BS4 findall not returning all divs

Question

I was trying to get to the bottom table in the site,but findall() kept returning empty objects so i got all the divs on the same level one by one and noticed that when i try to get the last two it gives me the []

the_page=urllib.request.urlopen("https://theunderminejournal.com/#eu/sylvanas/item/124105")
bsObj=BeautifulSoup(the_page,'html.parser')
test=bsObj.findAll('div',{'class':'page','id':"item-page"})
print(test)

I have gone through the bs4 object that i got and the 2 divs im looking for arent in it.Whats happening?

the div im looking for is in the https://theunderminejournal.com/#eu/sylvanas/item/124105

this is the div im trying to extract

Can you provide some example html that can be used and will reproduce the problem? ... [mcve] — wwii, Mar 20 '18 at 23:24
you mean the page i was scraping? its:https://theunderminejournal.com/#eu/sylvanas/item/124105 the div im looking for is inside the
and its
... — Andrej Licanin, Mar 20 '18 at 23:29
you're specifying `'id':"item-page"` usually ids are unique. So, I would expect this to only return 1 item at most. (although webpages can break this rule of ids being unique) — sytech, Mar 20 '18 at 23:31
No I mean include some html in your question (formatted as code) that can be used with your code. [mcve] — wwii, Mar 20 '18 at 23:31
is used the `'id':'something'` to get the rest of the divs on the page and it worked but it wont for the ones with `"id":"search-page"` and `"id":"item-page"` they are not even appearing in the bs object — Andrej Licanin, Mar 20 '18 at 23:39
Please don't post images of code or data. Just copy and paste it -format as code, like you did before. We need to be able to copy and paste your code and data into our editors so we can help you. — wwii, Mar 21 '18 at 00:04
Can you specify which part of that `div` tag do you want to exactly scrape? I could suggest a solution without using Selenium with that information. — Keyur Potdar, Mar 21 '18 at 04:28
ok,sorry for the image. Im trying to get the table `` part of the '
` — Andrej Licanin, Mar 21 '18 at 07:05

score 4 · Accepted Answer · answered Mar 21 '18 at 02:10

You will need to use selenium instead of the normal requests libraries.

Note that I couldn't post all of the output as the HTML parsed was huge.

Code:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://theunderminejournal.com/#eu/sylvanas/item/124105")
bsObj = BeautifulSoup(driver.page_source,'html.parser')
test = bsObj.find('div', id='item-page')
print(test.prettify())

Output:

<div class="page" id="item-page" style="display: block;">
 <div class="item-stats">
  <table>
   <tr class="available">
    <th>
     Available Quantity
    </th>
    <td>
     <span>
      30,545
     </span>
    </td>
   </tr>
   <tr class="spacer">
    <td colspan="3">
    </td>
   </tr>
   <tr class="current-price">
    <th>
     Current Price
    </th>
    <td>
     <span class="money-gold">
      27.34
     </span>
    </td>
   </tr>
   <tr class="median-price">
    <th>
     Median Price
    </th>
    <td>
     <span class="money-gold">
      30.11
     </span>
    </td>
   </tr>
   <tr class="mean-price">
    <th>
     Mean Price
    </th>
    <td>
     <span class="money-gold">
      30.52
     </span>
    </td>
   </tr>
   <tr class="standard-deviation">
    <th>
     Standard Deviation
    </th>
    <td>
     <span class="money-gold">
      .
      .
      .
       </span>
      </abbr>
     </td>
    </tr>
   </table>
  </div>
 </div>
</div>

Thank you for the answer.Do you know why this is happening,why cant the urlreqest get the 2 divs on the bottom? — Andrej Licanin, Mar 21 '18 at 07:10
I believe that is because the website has some parts the are rendered with javascript. Refer to this answer for more explanation https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python — Ali, Mar 21 '18 at 14:58

BS4 findall not returning all divs

1 Answers1