0

I am trying to make a script that will download a file from API official air quality control in Poland (http://powietrze.gios.gov.pl/pjp/content/api). The location on the server is:

URL  <- "http://api.gios.gov.pl/pjp-api/rest/station/findAll"

The problem is I am working on remote server and company network, this maybe caused by firewall or proxy. I had similar problem with webscraping before, but solution from rvest Error in open.connection(x, "rb") : Timeout was reached helped. Unfortunatly this time is not the case. I tried downloading the file:

URL  <- "http://api.gios.gov.pl/pjp-api/rest/station/findAll"
File_name <- "tmp.csv"
download.file(URL, destfile = File_name, quiet=TRUE)  

but later the file is unreadable (due to incomplete readTableHeader). When I tried downloading a file in .json format as in https://www.tutorialspoint.com/r/r_json_files.htm and read it with fromJSON, I got Error in fromJSON(file = File_name) : argument "txt" is missing, with no default

I have also tried using fromJSON(URL) as sugested in https://cran.r-project.org/web/packages/jsonlite/vignettes/json-apis.html, but i get error: Error in open.connection(con, "rb") : Timeout was reached. I changed options(timeout= 4000000), but id did not help.

Also I tried GET(URL) as in https://www.r-bloggers.com/accessing-apis-from-r-and-a-little-r-programming/, also with progress() and verbose() arguments as in Can't use jsonlite in R to read json format file

EDIT

As sugested by @Junhee Shin I have tried following methods:

  • wget which worked for >5 min and did not produce anything
  • internal coused error Error in download.file(URL, destfile = File_name, method = "internal") : cannot open URL 'http://api.gios.gov.pl/pjp-api/rest/station/findAll' In addition: Warning message: In download.file(URL, destfile = File_name, method = "internal") : unable to connect to 'api.gios.gov.pl' on port 80.
  • wininet which worked (with error Content type 'application/json;charset=UTF-8' length unknown, but fromJSON had an error argument "txt" is missing, with no default
  • libcurl which crashed my R session twice
  • curl coused error: Warning messages: 1: running command 'curl "http://api.gios.gov.pl/pjp-api/rest/station/findAll" -o "D:\magisterka\Wroclaw Open Data\tmp.json"' had status 127 2: In download.file(URL, destfile = File_name, method = "curl") : download had nonzero exit status
  • auto which worked (with error Content type 'application/json;charset=UTF-8' length unknown, but fromJSON had an error argument "txt" is missing, with no default

EDIT 2

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
AAAA
  • 461
  • 6
  • 22
  • try this. download.file(URL, File_name, 'wget') download load method. "auto", "internal", "libcurl", "wget", "curl" – Junhee Shin Mar 15 '18 at 10:22
  • @JunheeShin I have tried it (and written in EDIT), but it did not work – AAAA Mar 15 '18 at 11:09
  • I think it's a firewall issue as I had no problems using the fromJSON(URL) method. I found this https://stackoverflow.com/questions/47528321/cant-use-jsonlite-in-r-to-read-json-format-file and hope it's helpful. – TTR Mar 15 '18 at 12:37
  • @TTR I have tried it, but error `Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached` occured – AAAA Mar 16 '18 at 12:34

1 Answers1

0

Here is another potential alternative solution :

library(RSelenium)
library(XML)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate("http://api.gios.gov.pl/pjp-api/rest/station/findAll")
Sys.sleep(5)

page_Content <- remDr$getPageSource()[[1]]
readHTMLTable(page_Content)

Here is another potential alternative solution :

library(pdftools)
library(pagedown)
library(stringr)
library(jsonlite)
chrome_print("http://api.gios.gov.pl/pjp-api/rest/station/findAll",
             "C:\\...\\json_pdf.pdf")

text <- pdf_text("C:\\...\\json_pdf.pdf")
text <- paste0(text, collapse = "")
text <- str_remove_all(text, pattern = "\\r\\n")
result <- jsonlite::fromJSON(text)
Emmanuel Hamel
  • 1,769
  • 7
  • 19