1

I was trying to get to the bottom table in the site,but findall() kept returning empty objects so i got all the divs on the same level one by one and noticed that when i try to get the last two it gives me the []

the_page=urllib.request.urlopen("https://theunderminejournal.com/#eu/sylvanas/item/124105")
bsObj=BeautifulSoup(the_page,'html.parser')
test=bsObj.findAll('div',{'class':'page','id':"item-page"})
print(test)

I have gone through the bs4 object that i got and the 2 divs im looking for arent in it.Whats happening?

the div im looking for is in the https://theunderminejournal.com/#eu/sylvanas/item/124105

this is the div im trying to extract

  • 1
    Can you provide some example html that can be used and will reproduce the problem? ... [mcve] – wwii Mar 20 '18 at 23:24
  • you mean the page i was scraping? its:https://theunderminejournal.com/#eu/sylvanas/item/124105 the div im looking for is inside the
    and its
    ...
    – Andrej Licanin Mar 20 '18 at 23:29
  • you're specifying `'id':"item-page"` usually ids are unique. So, I would expect this to only return 1 item at most. (although webpages can break this rule of ids being unique) – sytech Mar 20 '18 at 23:31
  • No I mean include some html in your question (formatted as code) that can be used with your code. [mcve] – wwii Mar 20 '18 at 23:31
  • is used the `'id':'something'` to get the rest of the divs on the page and it worked but it wont for the ones with `"id":"search-page"` and `"id":"item-page"` they are not even appearing in the bs object – Andrej Licanin Mar 20 '18 at 23:39
  • Please don't post images of code or data. Just copy and paste it -format as code, like you did before. We need to be able to copy and paste your code and data into our editors so we can help you. – wwii Mar 21 '18 at 00:04
  • Can you specify which part of that `div` tag do you want to exactly scrape? I could suggest a solution without using Selenium with that information. – Keyur Potdar Mar 21 '18 at 04:28
  • ok,sorry for the image. Im trying to get the table `` part of the '
    `
    – Andrej Licanin Mar 21 '18 at 07:05

1 Answers1

4

You will need to use selenium instead of the normal requests libraries.

Note that I couldn't post all of the output as the HTML parsed was huge.

Code:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://theunderminejournal.com/#eu/sylvanas/item/124105")
bsObj = BeautifulSoup(driver.page_source,'html.parser')
test = bsObj.find('div', id='item-page')
print(test.prettify())

Output:

<div class="page" id="item-page" style="display: block;">
 <div class="item-stats">
  <table>
   <tr class="available">
    <th>
     Available Quantity
    </th>
    <td>
     <span>
      30,545
     </span>
    </td>
   </tr>
   <tr class="spacer">
    <td colspan="3">
    </td>
   </tr>
   <tr class="current-price">
    <th>
     Current Price
    </th>
    <td>
     <span class="money-gold">
      27.34
     </span>
    </td>
   </tr>
   <tr class="median-price">
    <th>
     Median Price
    </th>
    <td>
     <span class="money-gold">
      30.11
     </span>
    </td>
   </tr>
   <tr class="mean-price">
    <th>
     Mean Price
    </th>
    <td>
     <span class="money-gold">
      30.52
     </span>
    </td>
   </tr>
   <tr class="standard-deviation">
    <th>
     Standard Deviation
    </th>
    <td>
     <span class="money-gold">
      .
      .
      .
       </span>
      </abbr>
     </td>
    </tr>
   </table>
  </div>
 </div>
</div>
Ali
  • 1,357
  • 2
  • 12
  • 18
  • Thank you for the answer.Do you know why this is happening,why cant the urlreqest get the 2 divs on the bottom? – Andrej Licanin Mar 21 '18 at 07:10
  • I believe that is because the website has some parts the are rendered with javascript. Refer to this answer for more explanation https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – Ali Mar 21 '18 at 14:58