0

I am trying to do a Webscraping using R on the following Website https://www.ictax.admin.ch/extern/de.html#/ratelist/2022. I need the information in the following table Exchange rate table. It is my first webscraping so I am unsure if I do something wrong.

I am working on a company network since that seems to be a question that is often asked.

My first approach was to use the package rvest and the following code

url <-"https://www.ictax.admin.ch/extern/de.html#/ratelist/2022"
exchange_rates <- url%>% read_html()%>% html_nodes(xpath='//*[@id="exchangeRates"]/div/div[2]')

Unfortunatly this did not work

I did also try it several other ways

exchange_rates <- url%>% read_html()%>% html_nodes("table")

exchange_rates <- url%>% read_html()%>% html_nodes('#exchangeRates')

I alway get {xml_nodeset (0)}

Since I get {xml_nodeset (0)} no matter what i try. I concluded that the page dynamically loads content as described in [https://stackoverflow.com/questions/57547825/xml-nodeset-0-issue-when-webscraping-table]. I then tried the solution described there.

I found the following url in the network tab: https://www.ictax.admin.ch/extern/api/coreGadget/exchangeRates.json which seems to bring in the data but this url is not working. (I also found soulutions using the package V8. Because of company restiction it would be difficult for me to work with this package)

Does anyone know a way to get the data from this website usind R?

gada
  • 1
  • 1
  • The data you are looking for is not in the HTML source so `read_html` won't work. The data is loaded via javascript after the page loads. You found the link where it's pulling the data from the server prevents direct request to that URL to protect the data. You'll need to emulate a web browser either using V8 or RSelenuim. – MrFlick Dec 20 '22 at 16:16
  • Thanks very much for your comment. Using V8 or RSelenium is what I was trying to avoid. But if someone can help me with a RSelenium solution this would be really apprechiated. RSelenium seems to be a pretty involed package to work with for a beginner in Webcrawling like myself. – gada Dec 21 '22 at 08:05

1 Answers1

0

My understanding is that the page takes a few seconds to load. Hence, when you ask for the information of the exchange rates, the table is not loaded in the page. We can avoid this problem by using RSelenium and using Sys.sleep.

With the code below, I have been able to extract the information in the table.

library(RSelenium)
shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")

remDr$open()
remDr$navigate("https://www.ictax.admin.ch/extern/de.html#/ratelist/2022")

Sys.sleep(5)

web_Obj_Table <- remDr$findElement("xpath", '//*[@id="exchangeRates"]/div/div[2]/table')
text_Table <- web_Obj_Table$getElementText()[[1]]
text_Table <- strsplit(text_Table, "\n")[[1]]
text_Table

1] "AUD 1 = 0.627450 CHF"    "CAD 1 = 0.682860 CHF"    "EUR 1 = 0.987450 CHF"    "GBP 1 = 1.112933 CHF"   
[5] "HKD 100 = 11.854300 CHF" "JPY 100 = 0.701200 CHF"  "USD 1 = 0.925228 CHF"

I also have been able to extract the values with internet explorer with the RDCOMClient R package :

library(RDCOMClient)
library(stringr)
url <- "https://www.ictax.admin.ch/extern/de.html#/ratelist/2022"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(20)
doc <- IEApp$Document()

web_Obj <- doc$querySelector("#exchangeRates > div > div.panel-body > table")
text <- web_Obj$innerText()
text <- stringr::str_remove_all(text, "\r\n")
text <- stringr::str_replace_all(text, "CHF", "CFH;")
text <- stringr::str_split(text, ";")[[1]]
text

[1] "         AUD 1 = 0.627450 CFH" "  CAD 1 = 0.682860 CFH"        "  EUR 1 = 0.987450 CFH"       
[4] "  GBP 1 = 1.112933 CFH"        "  HKD 100 = 11.854300 CFH"     "  JPY 100 = 0.701200 CFH"     
[7] "  USD 1 = 0.925228 CFH"        "  " 
Emmanuel Hamel
  • 1,769
  • 7
  • 19