0

I want to scrape the next page: 'https://www.idealista.com/alquiler-viviendas/girona-provincia/' with rvest package and it gives me the following error:'Error in open.connection(x, "rb") : HTTP error 403.'

library(rvest)
library(curl)
library(xm12)

url= 'https://www.idealista.com/alquiler-viviendas/girona-provincia/'
webidealista=read_html(url)

webidealista=read_html(url)

Error in open.connection(x, "rb") : HTTP error 403.

Can someone help me fix it? I'll be very grateful.
enter image description here

Retore
  • 1
  • Please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Please add data using dput and show the expected output for the same. Please read the info about [How to ask good question](https://stackoverflow.com/help/minimal-reproducible-example) & [Reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – rj-nirbhay Jun 03 '20 at 19:19
  • What are you trying to scrape from the webpage, exactly? – windyvation Mar 15 '21 at 22:51

1 Answers1

0

I was able to get the html content of the page with the following code :

library(RSelenium)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate("https://www.idealista.com/alquiler-viviendas/girona-provincia/")

# Close the pop-up ...
web_Obj_Accept <- remDr$findElement("xpath", "//*[@id='didomi-notice-agree-button']/span")
web_Obj_Accept$clickElement()

# Get content ...
html_Content <- remDr$getPageSource()[[1]]
Emmanuel Hamel
  • 1,769
  • 7
  • 19