-2

I am trying to reproduce the code of the chosen answer from this post Issue scraping page with "Load more" button with rvest, on this website https://www.coindesk.com/. However, the following line is giving an error:

#original    
#load_btn <- ffd$findElement(using = "css selector", ".load-more .btn")
#modified
load_btn <- ffd$findElement(using = "css selector", ".load-more-stories .btn")

Selenium message:Unable to locate element: load-more-stories For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10' System info: host: 'LAPTOP-sdsds9L', ip: 'sdssd', os.name: 'Windows 10', os.arch: 'x86', os.version: '10.0', java.version: '1.8.0_211' Driver info: driver.version: unknown

Error: Summary: NoSuchElement Detail: An element could not be located on the page using the given search parameters. class: org.openqa.selenium.NoSuchElementException Further Details: run errorDetails method

I assumed the buttom name based on the lines 449-452:

 </div>
            <div id="load-more-stories">
    <button>Load More Stories</button>
</div>        </div>

Any idea how to adapt this strategy properly?

user3091668
  • 2,230
  • 6
  • 25
  • 42

3 Answers3

1

DIAGNOSIS: basically you are running into this problem because the page is not redirecting to another page, instead, it is adding article links on the page. I wrote this using Web Scraping Language

GOTO www.coindesk.com >> CRAWL ['#load-more-stories', 3] .stream-article >> EXTRACT {'title':'.meta h1', 'article':'.article-content'}

EXPLANATION: This should crawl all the articles up to the 3rd page by clicking on the #load-more-stories or "Load More Stories" link at the bottom. It then visits every link with the selector .stream-article and on the subsequent page, it extracts the title and article using the respective selectors.

qimisle
  • 75
  • 6
  • Any idea how to adapt it using the RSelenium package? – user3091668 Aug 08 '19 at 21:55
  • @user3091668 It's library agnostic, and just cloud based service which means you write WSL and it crawls and scrapes all the data. It's also easy to read which means you can maintain it down the road, instead of dealing with actual code. – user299709 Aug 09 '19 at 18:40
1

You first need to dimiss cookie bar by clicking accept button, then move on to using the load-more-stories as the id, not class. I can't test in R but something like:

cookie_button  <- ffd$findElement("css selector", '#CybotCookiebotDialogBodyLevelButtonAccept')
cookie_button$clickElement()
load_more_button  <- ffd$findElement("css selector", '#load-more-stories')
load_more_button$clickElement()

References:

  1. https://cran.r-project.org/web/packages/RSelenium/RSelenium.pdf
QHarr
  • 83,427
  • 12
  • 54
  • 101
0

A HTML id= is not the same as a CSS class.

Your selector hence is wrong and does not match.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194