0

I'm very new to Webscraping and I'm struggling to get the values of two attributes from specific element

I want to find the data-diffusion-decimal & data-diffusion-history

enter image description here

soup.findAll('div',attrs={"class":"RC-runnerPriceWrapper"})

What I get back is:

<div class="RC-runnerPriceWrapper PC-bestOddsContainer js-diffusionHorsesList js-horsesList js-bestOddsPriceContainer" data-diffusion-horsename="Dinons">  <a class="ui-btn RC-runnerPrice ui-priceBtn_noPrice js-diffusionPriceValue js-betHandler js-runnerPrice js-runnerPriceBestOdds" data-test-selector="RC-cardPage-runnerPrice" href="#"></a>

This is as far as I get but what I need isn't contained in the result. Any advice greatly appreciated

Andersson
  • 51,635
  • 17
  • 77
  • 129
Monkeydave
  • 91
  • 3
  • 9
  • 1
    Most likely those attributes were generated dynamically, so you can get them only with tool that support JavaScript execution – Andersson Oct 01 '18 at 13:19
  • could you share the url of the page that you are trying to scrape this information from? – Anuvrat Parashar Oct 01 '18 at 13:20
  • Thanks Andersson so I would need to use something like Selenium? As mentioned in the post I'm very new to Webscraping so any advise on what to use or what to do is really appreciated – Monkeydave Oct 01 '18 at 13:20
  • 1
    @Monkeydave , Selenium, requests-html, PyQt5... Also you can check XHRs in Network tab (F12) - possibly you can get those values by simulating XHR – Andersson Oct 01 '18 at 13:31

2 Answers2

2

Maybe these attributes are set dynamically in javascript. To know that, do not use the console but right click on the page then 'View page source'.

If you cannot find these attributes in the source code, they are set with javascript and you need a tool like Selenium to execute the dynamic part of the page.

Workaround : using the 'Network' tab of your browser console, you can try to see if an ajax request is executed to get the data in the attributes. Instead of parsing your page, you can call the same request and perhaps get the informations in json format.

Corentin Limier
  • 4,946
  • 1
  • 13
  • 24
1

Use selenium, and something like this

driver.find_element_by_css_selector('div.RC-runnerPriceWrapper').get_attribute('data-diffusion-decimal')
Andersson
  • 51,635
  • 17
  • 77
  • 129