0

I tried to scrape HTML table as data.frame in R, and I used existing solution scraping html in R, but R raised an error down below:

> theurl <- "https://www.dwd.de/DE/leistungen/klimadatendeutschland/statliste/statlex_html.html?view=nasPublication&nn=16102"
> webpage <- getURL(theurl)
Error in function (type, msg, asError = TRUE)  : 
  error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol

I don't understand why this error happened. How can I resolve this problem? Basically, I want to scrape HTML table as data.frame in R, but now R complains about unknown protocol error. Any idea?

How can I scrape German weather station data as a data.frame in R? How to fix the error? Thanks

Here is session information so you can check on your machine.

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] rlist_0.4.6.1  RCurl_1.95-4.8 bitops_1.0-6  
[4] XML_3.98-1.5  

loaded via a namespace (and not attached):
[1] compiler_3.4.3    tools_3.4.3       yaml_2.1.14      
[4] data.table_1.10.4

Update: my tryout based on SO given solution

theurl <- httr::GET("https://www.dwd.de/DE/leistungen/klimadatendeutschland/statliste/statlex_html.html?view=nasPublication&nn=16102",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)

Here is the error:

> tables <- readHTMLTable(theurl)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"response"’
Andy.Jian
  • 417
  • 3
  • 15
  • @MarcoSandri I just tried but it returns empty character (no value in it) when I assign `theurl` to `getURL` method. Any more thoughts? Plus I basically recall the solution in SO [scarping html in R](https://stackoverflow.com/questions/1395528/scraping-html-tables-into-r-data-frames-using-the-xml-package/37199673) . Any idea? – Andy.Jian Mar 09 '18 at 16:19
  • 2
    I would recommend using the `httr` package. It seems to be better with the SSL certificate stuff required for `https` pages. Try `httr::GET(theurl)`. – MrFlick Mar 09 '18 at 16:26
  • @MrFlick do you have customized solution to read above html table ([german weather station data](https://www.dwd.de/DE/leistungen/klimadatendeutschland/statliste/statlex_html.html?view=nasPublication&nn=16102) ) as data.frame in R? Seems the case in `SO` for scraping html table in R is bit of different from mine. I updated my trial above. – Andy.Jian Mar 09 '18 at 16:35
  • The not `rvest` package? – Aleh Mar 09 '18 at 16:40
  • If the problem is really parsing the table, then the `rvest` packate might be more helpful. Something like `library(rvest); dd <- theurl %>% read_html() %>% html_table(); dd` – MrFlick Mar 09 '18 at 16:43
  • 1
    That particular SO question is old and the "best/newest" answer was down toward the bottom: https://stackoverflow.com/a/37199673/2372064 – MrFlick Mar 09 '18 at 16:44
  • @MrFlick it worked perfectly, thanks for your help :) – Andy.Jian Mar 09 '18 at 17:07

1 Answers1

1

This is how you can do it with with rvest:

library("rvest")

url <-"https://www.dwd.de/DE/leistungen/klimadatendeutschland/statliste/statlex_html.html?view=nasPublication&nn=16102"

data <- url %>%
    read_html() %>%
    html_nodes(xpath='/html/body/font/table') %>%
    html_table()

data <- data[[1]]
head(data)
Aleh
  • 776
  • 7
  • 11