0

I can't use R to read the .json file, but I can see it on the web site.

Below is the site of data↓

https://data.kcg.gov.tw/dataset/7999ac19-e7dc-496a-9b7d-bd8daec107bd/resource/19d06299-a80c-42c2-a9b8-63d4466161a0/download/priceshistory20160101-20161231.json

Here is my code.

library(jsonlite)
link <- "https://data.kcg.gov.tw/dataset/7999ac19-e7dc-496a-9b7d-bd8daec107bd/resource/19d06299-a80c-42c2-a9b8-63d4466161a0/download/priceshistory_20160101-20161231.json"
kh <- fromJSON(link)

Error in open.connection(con, "rb") : Couldn't connect to server

Any help will be thankful.

> sessionInfo()
R version 3.3.1 (2016-06-21)
latform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
swchen
  • 643
  • 2
  • 8
  • 24
  • Are you running as administrator? – JARO Nov 28 '17 at 09:46
  • It might be some special character in your url. Check this question https://stackoverflow.com/questions/42739594/r-jsonlite-fromjson-always-returns-error-in-open-connection – Barbara Nov 28 '17 at 09:51
  • You must be behind some firewall, if you are so refer this https://support.rstudio.com/hc/en-us/articles/200488488-Configuring-R-to-Use-an-HTTP-or-HTTPS-Proxy – amrrs Nov 28 '17 at 11:06

1 Answers1

2

Your main error is very likely the firewall issue others have pointed out. You may be able to use httr to triage better:

library(httr)
library(jsonlite)

link <- "https://data.kcg.gov.tw/dataset/7999ac19-e7dc-496a-9b7d-bd8daec107bd/resource/19d06299-a80c-42c2-a9b8-63d4466161a0/download/priceshistory_20160101-20161231.json"

The connection, here, worked for me but the data has some issues (which is the main reason I posted this answer):

kh <- jsonlite::fromJSON(json_url)
## Error in parse_con(txt, bigint_as_char) : 
##   lexical error: invalid char in json text.
##                                        [   {     "result":{       "
##                      (right here) ------^
## In addition: Warning message:
## JSON string contains (illegal) UTF8 byte-order-mark! 

That error means the BOM wasn't removed (we'll have to do that, then).

Here's a way you can triage the connection a bit using httr::GET():

httr::GET(
  link, 
  progress(), # it's a 13MB file on a slow connection for North America, so this helps
  verbose()   # this lets you see the connection info to make sure nothing is wrong
) -> res

This had no errors so I'm not pasting the verbose output, but you should look at the verbose output and see what HTTP errors show up. That may help diagnose any proxy/firewall issues. Using the latest curl and httr packages may also help get through this as they play nicer with Windows OS now.

Back to the BOM issue, which is still likely going to be an issue for you:

hk_raw <- httr::content(res, as="raw")

hk_raw[1:10]
## [1] ef bb bf ef bb bf 5b 0a 20 20

I'm not sure why the UTF-8 BOM sequence is there 2x, but that's easy to deal with (and will need to be dealt with)

hk <- jsonlite::fromJSON(rawToChar(hk_raw[-(1:6)]))

That should give you the data structure fully read in.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • Thanks for your answer, but I can't run `GET(link)` function, it shows an error: **Error in curl::curl_fetch_memory(url, handle = handle) : Couldn't connect to server**. Does this means I am blocked by firewall? – swchen Dec 01 '17 at 00:43
  • Did you run it with `verbose()` as suggested? If so, what did that diagnostic output say? – hrbrmstr Dec 01 '17 at 00:48
  • I try `verbose()` as you suggested, still can't work, the [screenshot](https://imgur.com/733lW9R) here shows the error. – swchen Dec 05 '17 at 05:59