1

i have extract the table that say "R.U.T" and "Entidad" of the page

http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554

I make the follow code:

library(rvest)
    #put page
    url<-paste("http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554.html",sep="")
     url<-read_html(url)
    #extract table

table<-html_node(url,xpath='//*[@id="listado_fiscalizados"]/table') #xpath
table<-html_table(table)

#transform table to data.frame
table<-data.frame(table)

but R show me the follow result:

> a
{xml_nodeset (0)}

That is, it is not recognizing the table, Maybe it's because the table has hyperlinks?

If anyone knows how to extract the table, I would appreciate it. Many thanks in advance and sorry for my English.

alistaire
  • 42,459
  • 4
  • 77
  • 117
user119144
  • 59
  • 7
  • It looks like the table is loaded with JavaScript, so you'll need to grab the HTML via RSelenium or the like. [Here's a recent example](http://stackoverflow.com/a/41497119/4497050) that you should be able to translate directly. – alistaire Jan 10 '17 at 22:07
  • I knew about Rselenium, but I wanted to work on another type of solution. Thank you very much for your answer, if I do not find a different solution I will take Rselenium :) – user119144 Jan 11 '17 at 01:39

2 Answers2

2

It makes an XHR request to another resource which is used to make the table.

library(rvest)
library(dplyr)

pg <- read_html("http://www.svs.cl/institucional/mercados/consulta.php?mercado=S&Estado=VI&consulta=CSVID&_=1484105706447")

html_nodes(pg, "table") %>%
  html_table() %>%
  .[[1]] %>%
  tbl_df() %>%
  select(1:2)
## # A tibble: 36 × 2
##        R.U.T.                                            Entidad
##         <chr>                                              <chr>
## 1  99588060-1                           ACE SEGUROS DE VIDA S.A.
## 2  76511423-3                               ALEMANA SEGUROS S.A.
## 3  96917990-3                      BANCHILE SEGUROS DE VIDA S.A.
## 4  96933770-3                          BBVA SEGUROS DE VIDA S.A.
## 5  96573600-K                              BCI SEGUROS VIDA S.A.
## 6  96656410-5                 BICE VIDA COMPAÑIA DE SEGUROS S.A.
## 7  96837630-6            BNP PARIBAS CARDIF SEGUROS DE VIDA S.A.
## 8  76418751-2 BTG PACTUAL CHILE S.A. COMPAÑIA DE SEGUROS DE VIDA
## 9  76477116-8                            CF SEGUROS DE VIDA S.A.
## 10 99185000-7           CHILENA CONSOLIDADA SEGUROS DE VIDA S.A.
## # ... with 26 more rows

You can use Developer Tools in any modern browser to monitor the Network requests to find that URL.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • This is the solution I was looking for. I changed the url and xpath in code and it work. Thank you very much. One query, how did you know the table came from a reference? – user119144 Jan 11 '17 at 04:06
  • "You can use Developer Tools in any modern browser to monitor the Network requests to find that URL.". It's worth the effort to poke at browser "Inspect" / "Inspect Element" / "Developer Tools". Tons of good stuff under the covers of most web pages. – hrbrmstr Jan 11 '17 at 04:09
1

This is the answer using RSelenium:

# Start Selenium Server
RSelenium::checkForServer(beta = TRUE)
selServ <- RSelenium::startServer(javaargs = c("-Dwebdriver.gecko.driver=\"C:/Users/Mislav/Documents/geckodriver.exe\""))
remDr <- remoteDriver(extraCapabilities = list(marionette = TRUE))
remDr$open() # silent = TRUE
Sys.sleep(2)

# Simulate browser session and fill out form
remDr$navigate("http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554.html")
Sys.sleep(2)
doc <- htmlParse(remDr$getPageSource()[[1]], encoding = "UTF-8")

# close and stop server
remDr$close()
selServ$stop()

tables <- readHTMLTable(doc)
head(tables)
Mislav
  • 1,533
  • 16
  • 37
  • You need to show what packages you're loading at the top; it looks like `XML` as well as `RSelenium`. – alistaire Jan 10 '17 at 23:02
  • Thank you very much for your answer, this works :D. Anyway I will continue to see a solution without RSelenium. – user119144 Jan 11 '17 at 01:57