0

Is there any way to scrape data in R for:

General Information/Launch Date from this Website: https://www.euronext.com/en/products/etfs/LU1437018838-XAMS/market-information

So far, I have used this code, but the generated XML file does not contain Information that I Need:

library(rvest)
library(XML)

url <- paste("https://www.euronext.com/en/products/etfs/LU1437018838-XAMS/market-information",sep="")

download.file(url, destfile = "scrapedpage.html", quiet=TRUE)
content <- read_html("scrapedpage.html")

content1 <- htmlTreeParse(content, error=function(...){}, useInternalNodes = TRUE)
Dave2e
  • 22,192
  • 18
  • 42
  • 50
Thang Do
  • 3
  • 1
  • 1
  • 4
  • "the generated XML file does not contain Information that I Need" What information is that exactly? How does that differ from what you get? – camille Aug 09 '18 at 14:19
  • You could use xpathSApply to parse the data you need from the content variable. This will involve a bit of manual work to specify exactly which pieces of the page you require. – Kharoof Aug 09 '18 at 14:29
  • When you open the link, you can see: General Information/Launch Date, and I need the Information: 16 May 2017. But it is not shown in XML file, that what I mean. – Thang Do Aug 09 '18 at 15:06

1 Answers1

0

What you are trying to scrap is in an AJAX object called factsheet (I dont know javascript so I cant tell you more). Here is a solution to get what you want : Get the URL of the data used by javascript using the network analysis from your browser (XHR thing). See here.

library(rvest)
url <- read_html("https://www.euronext.com/en/factsheet-ajax?instrument_id=LU1437018838-XAMS&instrument_type=etfs")
launch_date <- url %>% html_nodes(xpath = "/html/body/div[2]/div[1]/div[3]/div[4]/strong")%>%
  html_text()