17

UPDATE:

The tl;dr is that RJSONIO is no longer the faster of the two options. Rather rjson is now much faster.

See the comments for additional confirmation of results


I was under the impression that RJSONIO was supposed to be faster tha rjson.
However, I am getting the opposite results.

My Question is:

  • Is there any tuning that can/should be performed to improve the results from RJSONIO? (ie, Am I overlooking something?)

Below are the comparisons using real data (where U is the contents of a json webpage) and then a mocked up json

## REAL DATA
library(microbenchmark)
> microbenchmark(RJSONIO::fromJSON(U), rjson::fromJSON(U))

Unit: milliseconds
                  expr       min        lq    median        uq      max
1   rjson::fromJSON(U)  29.46913  30.16218  31.74999  34.11012 158.6932
2 RJSONIO::fromJSON(U) 175.11514 181.67742 186.52871 195.90646 414.6160

> microbenchmark(RJSONIO::fromJSON(U, simplify=FALSE), rjson::fromJSON(U))
Unit: milliseconds
                                    expr       min       lq    median        uq        max
1                     rjson::fromJSON(U)  27.92341  28.7430  29.60091  30.63291 1 143.9478
2 RJSONIO::fromJSON(U, simplify = FALSE) 173.30136 179.5815 183.94315 190.17245 2 328.8996

Example with Mock Data

(Similar results)

# MOCK DATA
U <- toJSON(list(1:10, LETTERS, letters, rnorm(20)))

microbenchmark(RJSONIO::fromJSON(U), rjson::fromJSON(U))
# Unit: microseconds
#                   expr     min       lq   median       uq      max
# 1   rjson::fromJSON(U)  94.788 100.8650 105.6035 111.0740 3457.479
# 2 RJSONIO::fromJSON(U) 520.131 527.7775 533.2715 555.2415  942.136

Example 2 with iris dataset

Iris.JSON <- toJSON(iris)

microbenchmark(RJSONIO::fromJSON(Iris.JSON), rjson::fromJSON(Iris.JSON))
# Unit: microseconds
#                           expr      min       lq   median       uq       max
# 1   rjson::fromJSON(Iris.JSON)  229.669  235.571  238.511  241.423   260.164
# 2 RJSONIO::fromJSON(Iris.JSON) 1209.607 1224.793 1232.165 1238.953 12039.772

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.8.8 stringr_0.6.1    RJSONIO_1.0-1    rjson_0.2.11

loaded via a namespace (and not attached):
[1] plyr_1.7.1
user227710
  • 3,164
  • 18
  • 35
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • 1
    I test your benchamarking and I confirm the result(I use `simplify = FALSE` to get identical results) – What do you expect as an answer? – agstudy Mar 09 '13 at 10:34
  • Can we have a full reproductible example ? Because in my settings RJSONIO is much faster than rjson. – dickoa Mar 09 '13 at 15:57
  • @dicko A full workable example was included. It may have been missed mixed in with the benchmarks. I separated it to be more visible. Also added session info. – Ricardo Saporta Mar 09 '13 at 17:03
  • @agstudy, I would have expected the results to be flipped -- ie for `RJSONIO` to have been much faster. [This is based on what I have heard about `RJSONIO` and so I'm trying to confirm if in fact it is slower or rather that I am simply doing something incorrectly] – Ricardo Saporta Mar 09 '13 at 17:08
  • @RicardoSaporta I tried and it seems that you are right and I'm somewhat surprised because with the iris data I get the opposite. Look at this : http://stackoverflow.com/questions/8216743/how-to-read-big-json – dickoa Mar 09 '13 at 17:30
  • I *think* `RJSONIO` used to be faster, but now `rjson` seems to beat it. *Even with the `iris` or bigger datasets*. Maybe it's connected to some compiler settings, although `rjson` also uses the C implementation since 0.2.7 - so this performance update should have happened about a year ago, not now. – daroczig Mar 09 '13 at 17:57
  • @daroczig, I'm not sure about its history, but currently `rjson` is beating `RJSONIO` on any dataset I am testing it on. – Ricardo Saporta Mar 09 '13 at 18:09
  • 2
    @RicardoSaporta: right, we agree on this. I just wrote about the history as I've benchmarked the two package a year ago in February pretty seriously, and `RJSONIO` seemed to perform a lot better. After that I stopped following any news about the `rjson` package, which is a shame as in March (2012) it started to use the C implementation of the JSON parser - IMHO it become much faster at that time compared to `RJSONIO` that already used the C lib. – daroczig Mar 09 '13 at 20:05
  • @daroczig nice and clear explanation, I think yours should have been the answer. – Michele Aug 11 '13 at 16:23
  • 8
    for anyone that finds this > 2015, I would strongly recommend `jsonlite` – timelyportfolio Jul 10 '15 at 01:35

2 Answers2

1
> library('BBmisc')
> suppressAll(lib(c('RJSONIO','rjson','jsonlite','microbenchmark')))
> U <- toJSON(list(1:10, LETTERS, letters, rnorm(20)))
> microbenchmark(
+     rjson::toJSON(U),
+     RJSONIO::toJSON(U),
+     jsonlite::toJSON(U, dataframe = "column"),
+     times = 10
+ )
Unit: microseconds
                                      expr     min      lq      mean   median      uq       max neval cld
                          rjson::toJSON(U)  65.174  68.767 2002.7007  88.2675 103.151 19179.224    10   a
                        RJSONIO::toJSON(U) 299.186 304.832  482.8038 329.7210 493.683  1351.727    10   a
 jsonlite::toJSON(U, dataframe = "column") 485.985 501.381  555.4192 548.5935 587.083   708.708    10   a

Testing system.time()

> microbenchmark(
+     system.time(rjson::toJSON(U)),
+     system.time(RJSONIO::toJSON(U)),
+     system.time(jsonlite::toJSON(U, dataframe = "column")),
+     times = 10)
Unit: milliseconds
                                                   expr      min       lq     mean   median       uq      max neval cld
                          system.time(rjson::toJSON(U)) 112.0660 115.8677 119.8426 119.8372 121.6908 132.2111    10  ab
                        system.time(RJSONIO::toJSON(U)) 115.4223 118.0262 129.2758 120.5690 148.5175 151.6874    10   b
 system.time(jsonlite::toJSON(U, dataframe = "column")) 113.2674 114.9096 118.0905 117.8401 120.9626 123.6784    10  a

Below are comparison of few packages. Hope these links help...

1) New package: jsonlite. A smart(er) JSON encoder/decoder.

2) Improved memory usage and RJSONIO compatibility in jsonlite 0.9.15

3) A biased comparsion of JSON packages in R

0

https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html

Please try jsonlite its the fastest in my experience for json data especially nested

also see

https://rstudio-pubs-static.s3.amazonaws.com/31702_9c22e3d1a0c44968a4a1f9656f1800ab.html

Ajay Ohri
  • 3,382
  • 3
  • 30
  • 60