1

I have a large data set, which is from the survival package, which I'm trying to save to disk:

> typeof(European204cad.prs.basic)
[1] "list"

I don't want to use saveRDS because the binary output can only be read by R. I want to process the output in Perl.

the data is class coxph, and it looks like:

European204cad.prs.basic$coefficients       European204cad.prs.basic$nevent
European204cad.prs.basic$var                European204cad.prs.basic$terms
European204cad.prs.basic$loglik             European204cad.prs.basic$assign
European204cad.prs.basic$score              European204cad.prs.basic$wald.test
European204cad.prs.basic$iter               European204cad.prs.basic$concordance
European204cad.prs.basic$linear.predictors  European204cad.prs.basic$y
European204cad.prs.basic$residuals          European204cad.prs.basic$timefix
European204cad.prs.basic$means              European204cad.prs.basic$formula
European204cad.prs.basic$method             European204cad.prs.basic$call
European204cad.prs.basic$n   

I have used jsonlite, rlist, and rjson, and all have failed to write my data to a file.

jsonlite:

> z <- toJSON(European204cad.prs.basic)
Error: No method asJSON S3 class: coxph

rlist:

> list.save(European204cad.prs.basic, file = 'tmp.json', type = 'JSON')
Error: No method asJSON S3 class: coxph
> list.save(European204cad.prs.basic, file = 'tmp.yaml', type = 'YAML')
Error in yaml::as.yaml(x, ...) : Unknown emitter error

and from rjson:

> z <- toJSON(European204cad.prs.basic)
Error in toJSON(European204cad.prs.basic) : 
  unable to convert R type 6 to JSON

How can I save this list to disk?

con
  • 5,767
  • 8
  • 33
  • 62
  • just saving in whatever format? `saveRDS`? – mnist Dec 05 '22 at 21:45
  • `saveRDS` can only be read by R. I want to read the output in Perl – con Dec 05 '22 at 21:45
  • please share some sample data – mnist Dec 05 '22 at 21:46
  • @mnist the data is huge, I don't know if I can reproduce this with a smaller set of data. I'll try to figure out a way – con Dec 05 '22 at 21:47
  • 3
    If you want to see what an object is, use `class()` rather than `typeof()`. It sounds like you are trying to save a `coxph` object which doesn't have an easy way to serialize to text. What data do you actually need to write out and read in Perl? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. These errors do not sound specific to your data. Any small dataset will likely do. – MrFlick Dec 05 '22 at 21:47
  • @MrFlick I don't know how to make a reproducible example, the data set is huge. I'm trying to figure it out – con Dec 05 '22 at 21:50
  • 3
    You could probably use the example from the `?coxph` help page. Just be clear on exactly what data you need to encode into JSON and how you want the JSON file to be structured. JSON is not a generic lossless serialization format. There are certain types of data it does not natively support. If you need to save that type of data then you need to be clear how you want to do that. – MrFlick Dec 05 '22 at 21:56
  • As MrFlick says, it's hard to know without representative data. If I was to have a stab in the dark, what about `toJSON(unclass(European204cad.prs.basic))` to try to treat your object as a basic `list`? – thelatemail Dec 05 '22 at 22:17
  • 1
    @thelatemail But the object contains things like `formula` which points to an active R environment and you can't serialize that to a string without loss of data. – MrFlick Dec 05 '22 at 22:20
  • 1
    Did you want `jsonlite::serializeJSON(European204cad.prs.basic)`? That will capture all data and attributes excluding environments. – Ritchie Sacramento Dec 05 '22 at 23:14

1 Answers1

3

Since you want to process in perl, I'm inferring that you don't want the non-data-like objects within a model. We can filter out the components we don't want and write that to json.

Using the first example from ?coxph:

library(survival)
test1 <- list(time=c(4,3,1,1,2,2,3), 
              status=c(1,1,1,0,1,1,0), 
              x=c(0,2,1,1,1,0,0), 
              sex=c(0,0,0,0,1,1,1)) 
mdl <- coxph(Surv(time, status) ~ x + strata(sex), test1) 

We can see what each of the components of mdl are:

str(lapply(mdl, class))
# List of 20
#  $ coefficients     : chr "numeric"
#  $ var              : chr [1:2] "matrix" "array"
#  $ loglik           : chr "numeric"
#  $ score            : chr "numeric"
#  $ iter             : chr "integer"
#  $ linear.predictors: chr "numeric"
#  $ residuals        : chr "numeric"
#  $ means            : chr "numeric"
#  $ method           : chr "character"
#  $ n                : chr "integer"
#  $ nevent           : chr "numeric"
#  $ terms            : chr [1:2] "terms" "formula"
#  $ assign           : chr "list"
#  $ wald.test        : chr "numeric"
#  $ concordance      : chr "numeric"
#  $ y                : chr "Surv"
#  $ timefix          : chr "logical"
#  $ formula          : chr "formula"
#  $ xlevels          : chr "list"
#  $ call             : chr "call"

It should be clear that we don't want things like formula and call, we can just accept the others:

jsonlite::toJSON(Filter(function(z) !inherits(z, c("formula", "call")), mdl))
# Error: No method asJSON S3 class: Surv

Okay, that's one you might (?) want to keep that we will need to reclass:

mdl$y
#  1  2  3  4  5  6  7 
#  4  3  1 1+  2  2 3+ 
dput(mdl$y)
# structure(c(4, 3, 1, 1, 2, 2, 3, 1, 1, 1, 0, 1, 1, 0), .Dim = c(7L, 
# 2L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7"), c("time", 
# "status")), type = "right", class = "Surv")

That looks like a matrix to me ...

as.matrix(mdl$y)
#   time status
# 1    4      1
# 2    3      1
# 3    1      1
# 4    1      0
# 5    2      1
# 6    2      1
# 7    3      0
mdl$y <- as.matrix(mdl$y)
jsonlite::toJSON(Filter(function(z) !inherits(z, c("formula", "call")), mdl))
# {"coefficients":[0.8023],"var":[[0.6763]],"loglik":[-3.8712,-3.3277],"score":[1.0509],"iter":[4],"linear.predictors":[-0.5731,1.0316,0.2292,0.2292,0.2292,-0.5731,-0.5731],"residuals":[-0.2631,-0.3094,0.7863,-0.2137,0.0463,0.5725,-0.6187],"means":[0.7143],"method":["efron"],"n":[7],"nevent":[5],"assign":{"x":[1]},"wald.test":[0.9518],"concordance":[3,1,2,1,0,0.6667,0.1667],"y":[[4,1],[3,1],[1,1],[1,0],[2,1],[2,1],[3,0]],"timefix":[true],"xlevels":{"strata(sex)":["sex=0","sex=1"]}} 

Note that mdl$y loses its names. If you want to preserve the column names of the matrix, convert to a frame instead:

mdl$y <- data.frame(as.matrix(mdl$y))
jsonlite::toJSON(Filter(function(z) !inherits(z, c("formula", "call")), mdl))
# {"coefficients":[0.8023],"var":[[0.6763]],"loglik":[-3.8712,-3.3277],"score":[1.0509],"iter":[4],"linear.predictors":[-0.5731,1.0316,0.2292,0.2292,0.2292,-0.5731,-0.5731],"residuals":[-0.2631,-0.3094,0.7863,-0.2137,0.0463,0.5725,-0.6187],"means":[0.7143],"method":["efron"],"n":[7],"nevent":[5],"assign":{"x":[1]},"wald.test":[0.9518],"concordance":[3,1,2,1,0,0.6667,0.1667],"y":[{"time":4,"status":1},{"time":3,"status":1},{"time":1,"status":1},{"time":1,"status":0},{"time":2,"status":1},{"time":2,"status":1},{"time":3,"status":0}],"timefix":[true],"xlevels":{"strata(sex)":["sex=0","sex=1"]}} 

For archiving reasons, if you want to preserve the formula as a string, you can do that with one of the following (and then toJSON).

## this
txt <- as.character(mdl$formula)
txt <- paste(c(txt[2], txt[-2]), collapse = " ")
mdl$formula <- txt
## or this, `paste`ing in case of multiline formulas
mdl$formula <- paste(capture.output(print(mdl$formula)), collapse = " ")
r2evans
  • 141,215
  • 6
  • 77
  • 149