4

Actual question

How can I serialize objects to ASCII and unserialize them again from ASCII without having to write to and read from a file connection (i.e. from ASCII that is in-memory)?

Background

In a state-less client-server framework, I would like to make certain information persistent accross calls (serialize >> send to client >> get serialized info back from client >> unserialize) without caching it on the server side.

Note that my JSON object/strong also contains other unserialized information and is thus mixed with the serialized information which is why the approach explained in this post doesn't completely do the trick.

Now, the thing is that I would like to unserialize the object solely based on the already-read JSON string. So to speak: from "in-memory ASCII" instead of from a file connection. How would I do that?

Here's what I tried:

require(forecast)

Approach 1

## SERVER: estimates initial model and writes JSON to socket
model <- auto.arima(AirPassengers, trace = TRUE)

## Model trace:
# ARIMA(2,1,2)(1,1,1)[12]                    : Inf
# ARIMA(0,1,0)(0,1,0)[12]                    : 967.6773
# ARIMA(1,1,0)(1,1,0)[12]                    : 965.4487
# ARIMA(0,1,1)(0,1,1)[12]                    : 957.1797
# ARIMA(0,1,1)(1,1,1)[12]                    : 963.5291
# ARIMA(0,1,1)(0,1,0)[12]                    : 956.7848
# ARIMA(1,1,1)(0,1,0)[12]                    : 959.4575
# ARIMA(0,1,2)(0,1,0)[12]                    : 958.8701
# ARIMA(1,1,2)(0,1,0)[12]                    : 961.3943
# ARIMA(0,1,1)(0,1,0)[12]                    : 956.7848
# ARIMA(0,1,1)(1,1,0)[12]                    : 964.7139
# 
# Best model: ARIMA(0,1,1)(0,1,0)[12]  

fc <- as.data.frame(forecast(model))
deparsed <- deparse(model)

json_out <- list(data = AirPassengers, model = deparsed, fc = fc)
json_out <- jsonlite::toJSON(json_out)

## CLIENT: keeps estimated model, updates data, writes to socket
json_in <- jsonlite::fromJSON(json_out)
json_in$data <- window(AirPassengers, end = 1949 + (1/12 * 14))

## SERVER: reads new JSON and applies model to new data
data <- json_in$data
model_0 <- json_in$model
model_1 <- eval(parse(text = model_0))

## Model trace:
# ARIMA(2,1,2)(1,1,1)[12]                    : Inf
# ARIMA(0,1,0)(0,1,0)[12]                    : 967.6773
# ARIMA(1,1,0)(1,1,0)[12]                    : 965.4487
# ARIMA(0,1,1)(0,1,1)[12]                    : 957.1797
# ARIMA(0,1,1)(1,1,1)[12]                    : 963.5291
# ARIMA(0,1,1)(0,1,0)[12]                    : 956.7848
# ARIMA(1,1,1)(0,1,0)[12]                    : 959.4575
# ARIMA(0,1,2)(0,1,0)[12]                    : 958.8701
# ARIMA(1,1,2)(0,1,0)[12]                    : 961.3943
# ARIMA(0,1,1)(0,1,0)[12]                    : 956.7848
# ARIMA(0,1,1)(1,1,0)[12]                    : 964.7139
# 
# Best model: ARIMA(0,1,1)(0,1,0)[12]  

# Warning message:
#   In auto.arima(x = structure(list(x = structure(c(112, 118, 132,  :
#       Unable to fit final model using maximum likelihood. AIC value approximated

fc <- as.data.frame(forecast(Arima(data, model = model_1)))

## And so on ...

That works, but note that eval(parse(text = json_in$model)) actually re-runs the call to auto.arima() instead of just re-establishing/unserializing the object (note the trace information printed to the console that I included as comments).

That's not completely what I want as simply want to re-establish the final model object in the fastest possible way.

That's why I turned toserialize() next.

Approach 2

## SERVER: estimates initial model and writes JSON to socket
model <- auto.arima(AirPassengers, trace = TRUE)
fc <- as.data.frame(forecast(model))
serialized <- serialize(model, NULL)
class(serialized)

json_out <- list(data = AirPassengers, model = serialized, fc = fc)
json_out <- jsonlite::toJSON(json_out)

## CLIENT: keeps estimated model, updates data, writes to socket
json_in <- jsonlite::fromJSON(json_out)
json_in$data <- window(AirPassengers, end = 1949 + (1/12 * 14))

## SERVER: reads new JSON and applies model to new data
data <- json_in$data
model_0 <- json_in$model
try(model_1 <- unserialize(model_0))
## --> error:
# Error in unserialize(model_0) : 
#   character vectors are no longer accepted by unserialize()

Unfortunately, function unserialize() expects a file connection instead of "plain ASCII".

So that's why I need to do the following workaround.

Approach 3

## SERVER: estimates initial model and writes JSON to socket
model <- auto.arima(AirPassengers, trace = TRUE)
fc <- as.data.frame(forecast(model))
con <- file("serialized", "w+")
serialize(model, con)
close(con)

json_out <- list(data = AirPassengers, model = "serialized", fc = fc)
json_out <- jsonlite::toJSON(json_out)

## CLIENT: keeps estimated model, updates data, writes to socket
json_in <- jsonlite::fromJSON(json_out)
json_in$data <- window(AirPassengers, end = 1949 + (1/12 * 14))

## SERVER: reads new JSON and applies model to new data
data <- json_in$data
model_0 <- json_in$model
con <- file(model_0, "r+")
model_1 <- unserialize(con)
close(con)
fc <- as.data.frame(forecast(Arima(data, model = model_1)))

## And so on ...

Unserializing works now without the actual auto.arima() call being re-evaluated. But it's against my state-less paradigm as now the actual information is cached on the server side instead of actually being sent via the JSON object/string.

Community
  • 1
  • 1
Rappster
  • 12,762
  • 7
  • 71
  • 120

1 Answers1

3

Does this fit your needs?

It follows the general strategy in your Approach 2. The only difference is that it uses as.character() to convert the serialized object to a character vector before passing it to toJSON(), and then uses as.raw(as.hexmode()) to convert it back to a raw vector "on the other side". (I've marked the two edited lines with comments reading ## <<- Edited.)

library(forecast)

## SERVER: estimates initial model and writes JSON to socket
model <- auto.arima(AirPassengers, trace = TRUE)
fc <- as.data.frame(forecast(model))
serialized <- as.character(serialize(model, NULL)) ## <<- Edited
class(serialized)


json_out <- list(data = AirPassengers, model = serialized, fc = fc)
json_out <- jsonlite::toJSON(json_out)

## CLIENT: keeps estimated model, updates data, writes to socket
json_in <- jsonlite::fromJSON(json_out)
json_in$data <- window(AirPassengers, end = 1949 + (1/12 * 14))

## SERVER: reads new JSON and applies model to new data
data <- json_in$data
model_0 <- as.raw(as.hexmode(json_in$model))       ## <<- Edited

unserialize(model_0)
## Series: AirPassengers 
## ARIMA(0,1,1)(0,1,0)[12]                    
## 
## Coefficients:
##           ma1
##       -0.3184
## s.e.   0.0877
## 
## sigma^2 estimated as 137.3:  log likelihood=-508.32
## AIC=1020.64   AICc=1020.73   BIC=1026.39
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Yes! You totally made my day! I'm not really familiar with the `raw` data type and thus would have never guessed that I need to use the combination of `as.hexmode()` and `as.raw()` without your help. Thank you so much!! – Rappster Apr 28 '15 at 20:11
  • 2
    @Rappster -- Cool. Glad that helped! I got that idiom a while ago from [this old SO post](http://stackoverflow.com/a/5950338/980833) of Joris Meys', and it actually helped me understand quite a bit better what that raw data type is all about. Not directly related, but to "see" a bit more what's going on, you can play around a bit with statments like these: `charToRaw("ab"); serialize("ab", NULL); charToRaw("AB"); serialize("AB", NULL); serialize(list("AB"), NULL)` (and then of course figure out the code needed to invert the transformations). Cheers. – Josh O'Brien Apr 28 '15 at 20:25
  • Alternatively, in that next to last line, one can do `as.raw(strtoi(json_in$model, base=16))`. That's what `as.hexmode()` uses under the hood. In both cases, the essential task is to go from a two letter character string that denotes a particular byte using hexadecimal representation **to** the integer that represents that same value in base 10 and then finally **to** the 8-bit byte that represents the same number in binary. – Josh O'Brien Nov 13 '15 at 03:40
  • One more note: IMHO, R's print method for vectors of class `"raw"` obscures the transformation that takes place in on the way from character string to byte. If it (infelicitously!) printed bytes as a sequence of ones and zeros, rather than using the same two-letter hexadecimal code as were used by the input `"character"` vectors, the transformation would be obvious! – Josh O'Brien Nov 13 '15 at 03:56