0

I've worked with this package before but after an R update, I had to re-install it and now I can't get it to run properly.

First issue was related with self signed certificates and I circumvented it with:

httr::set_config(httr::config(ssl_verifypeer = 0L))

Still, I can't get the basic functions to run. For example,

> rstatscn::statscnQueryZb(zbid = "zb", dbcode = "hgnd")
Error: lexical error: invalid char in json text.
                                       <!doctype html>  <html>  <head>
                     (right here) ------^
> rstatscn::statscnQueryData('A01',dbcode='hgyd')
Error in dataJson2df(ret, curQuery$rowcode, curQuery$colcode) : 
  Bad response from the statscn server

The first error indicates that the issue is that the actual webpage the package is scrapping from. It should be a JSON file and it's HTML. This is explained in "https://stackoverflow.com/questions/41000112/reading-a-json-file-in-r-lexical-error-invalid-char-in-json-text". Still, I don't now how to ammend this.

HDanyi
  • 1

1 Answers1

0

rstatscn requests data through a web form ( https://data.stats.gov.cn/english/easyquery.htm ) and that has probably changed, rewriting those functions to use GET instead of POSTmethods and to accept invalid certificates (!!!) seems to work (apparently the query returns something like: "Sorry, no information matching the query criteria could be found", even when replacing hgyd with hgnd, so there's a chance that the site and query form have gone through some significant changes) :

library(httr2)

statscnQueryZb <- function(zbid = "zb", dbcode = "hgnd"){ 
  request("https://data.stats.gov.cn/english/easyquery.htm") |>
    req_url_query(id = zbid, dbcode = dbcode, wdcode = "zb", m = "getTree") |>
    req_options(ssl_verifypeer = 0L) |>
    req_perform() |>
    resp_body_json(check_type = FALSE, simplifyVector = TRUE) |>
    tibble::as_tibble()
}

statscnQueryData <- function(zb = "A0201", dbcode = "hgnd", rowcode = "zb", colcode = "sj", moreWd = list(name = NA, value = NA)){ 
  request("https://data.stats.gov.cn/easyquery.htm") |>
    req_url_query(m = "QueryData", dbcode = dbcode, rowcode = rowcode, 
                  colcode = colcode, wds = rstatscn:::genDfwds(moreWd$name, moreWd$value), 
                  dfwds = rstatscn:::genDfwds("zb", zb), k1 = paste0(format(Sys.time(), "%s"), "000")) |>
    req_options(ssl_verifypeer = 0L) |>
    req_perform() |>
    resp_body_json(check_type = FALSE, simplifyVector = TRUE) |>
    tibble::as_tibble()
}

statscnQueryZb(zbid = "zb", dbcode = "hgnd")
#> # A tibble: 28 × 6
#>    dbcode id    isParent name                                       pid   wdcode
#>    <chr>  <chr> <lgl>    <chr>                                      <chr> <chr> 
#>  1 hgnd   A01   TRUE     General Survey                             ""    zb    
#>  2 hgnd   A02   TRUE     National Accounts                          ""    zb    
#>  3 hgnd   A03   TRUE     Population                                 ""    zb    
#>  4 hgnd   A04   TRUE     Employment and Wages                       ""    zb    
#>  5 hgnd   A05   TRUE     Investment in Fixed Assets and Real Estat… ""    zb    
#>  6 hgnd   A06   TRUE     Foreign Trade and Economic Cooperation     ""    zb    
#>  7 hgnd   A07   TRUE     Energy                                     ""    zb    
#>  8 hgnd   A08   TRUE     Finance                                    ""    zb    
#>  9 hgnd   A09   TRUE     Price Index                                ""    zb    
#> 10 hgnd   A0A   TRUE     People's Living Conditions                 ""    zb    
#> # ℹ 18 more rows

statscnQueryData('A01',dbcode='hgyd')
#> # A tibble: 1 × 2
#>   returncode returndata                          
#>        <int> <chr>                               
#> 1        501 对不起,未能找到符合查询条件的信息。

Created on 2023-04-06 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20
  • Thanks, margsul! It was really helpful. Both of your alternatives work properly, all you need is to use more specific zb, like "A010101". The only issue with the original `statscnQueryData` was in formating, which broke `json2df()`. So, you're probably right, the web form might have changed. – HDanyi Apr 08 '23 at 17:45