6

I am connecting to a MySQL database via R and some of the data that the database looks like the following:

a:1:{s:17:\"last_cart_refresh\";i:1470188219;}

My understanding is that this is PHP serialized data using the PHP serialize() function. Is there a way within R (either native or with a package) to make sense of this data?

EDIT: added clarification that I want a way in R not PHP to be able to read the data.

Dan
  • 2,625
  • 5
  • 27
  • 42
  • 1
    yes, the counterpart is [`unserialize`](http://php.net/manual/en/function.unserialize.php) – Kevin Aug 08 '16 at 01:42
  • 1
    @Ghost I think he means within R ;-) but you're 100% right PHP side! – Darren Aug 08 '16 at 01:46
  • @Ghost, apologies for the ambiguity. I have added clarification on needing a function within R, not PHP. – Dan Aug 08 '16 at 01:47
  • oh okay, you'd want this solved inside R, is this even related? http://stackoverflow.com/questions/1395115/storing-r-objects-in-a-relational-database – Kevin Aug 08 '16 at 01:47
  • 1
    @Darren lol :D added the wrong manual, maybe its time to sleep – Kevin Aug 08 '16 at 01:50
  • 1
    @Ghost haha, orrrrrr a coffee (*maybe 10?*) ;-P – Darren Aug 08 '16 at 01:51
  • There is a [phpserialize](https://cran.r-project.org/src/contrib/Archive/phpSerialize/) lib for R, not sure if there is also the unserialize function there. Check and update. – Dekel Aug 08 '16 at 01:51
  • @Darren, I really don't think it's related to php's serialize function in any way. – Dekel Aug 08 '16 at 01:53
  • The R implementation of serialization is very different to how PHP does it. for example if I serialise (in R) the value 1470188219 the outcome is a value that looks like this `58 0a 00 00 00 02 00 03 02 02 00 02 03 00 00 00 00 0e 00 00 00 01 41 d5 e8 52 ae c0 00 00` – Dan Aug 08 '16 at 01:55
  • @dekel, it appears that the package you mention is no longer in use, and isn't supported for my version of R (3.2.2). The last release to CRAN was in 2011, which may be an indication of its support. – Dan Aug 08 '16 at 01:58
  • Year, it's old, I know. Hoped it could still work... – Dekel Aug 08 '16 at 02:01
  • 1
    Well, if you don't find any package, and you know python, you can use [this lib](https://github.com/mitsuhiko/phpserialize/blob/master/phpserialize.py) as a reference to create your own unserialize function – Dekel Aug 08 '16 at 02:04
  • I don't know python, however it appears that i might be learning the basics very soon. – Dan Aug 08 '16 at 02:07

1 Answers1

4

To use php serialized data in R, you can use this function:

php_unserialize <- function(string){

  first <- unlist(strsplit(string, "\\{|\\}", fixed=F))
  inside_array <- unlist(strsplit(first[-1], ";", fixed=T))
  infomation_type <- substr(inside_array, 1,1)

  if(any(nchar(gsub("s|i", "", unique(infomation_type) )) != 0)){
    stop("unknow datatype in serilize data")
  }
  inside_array_s <- rep(NA, length(inside_array))

  pos <- infomation_type == "s"
  string_length <- as.numeric(sapply(strsplit(inside_array, ":", fixed=T), function(x) x[2]))[pos]
  inside_array_s[pos] <- substr(inside_array[pos], nchar(string_length)+4, nchar(inside_array[pos]))

  pos <- infomation_type == "i"
  inside_array_s[pos] <- substr(inside_array[pos],3,nchar(inside_array[pos]))

  # create key and value for each elment
  key <- inside_array_s[seq(1,length(inside_array_s),2)]
  value <- inside_array_s[seq(2,length(inside_array_s),2)]  

  return(cbind(key, value))
}

The function returns a matrix with the key's and value's of the php-array.

string <- 'a:1:{s:17:\"last_cart_refresh\";s:17:\"last_cart_refresh\";}'
php_unserialize(string)

If your php array contain other arrays or even nested arrays, use this more complex function. The result is than a list:

php_unserialize <- function(string){

  first <- unlist(strsplit(string, "\\{|\\}", fixed=F))
  inside_array <- unlist(strsplit(first[-1], ";", fixed=T))
  infomation_type <- substr(inside_array, 1,1)

  if(any(nchar(gsub("a|s|i", "", unique(infomation_type) )) != 0)){
    stop("unknow datatype in serilize data")
  }


  # element info: key & value ####
  # mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm

  inside_array_s <- rep(NA, length(inside_array))

  pos <- infomation_type == "s"
  string_length <- as.numeric(sapply(strsplit(inside_array, ":", fixed=T), function(x) x[2]))[pos]
  inside_array_s[pos] <- substr(inside_array[pos], nchar(string_length)+4, nchar(inside_array[pos]))

  pos <- infomation_type == "i"
  inside_array_s[pos] <- substr(inside_array[pos],3,nchar(inside_array[pos]))

  # create key and value for each elment
  key <- inside_array_s[seq(1,length(inside_array_s),2)]
  value <- inside_array_s[seq(2,length(inside_array_s),2)]


  # map array stuctur ####
  # mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm

  # create map1: sub-array beginning and sub-array lengths
  pos <- which(infomation_type == "a")

  if(length(pos) > 0){

    sa_begin <- pos/2
    sa_len <- as.numeric(substr(inside_array[pos],3,nchar(inside_array[pos])-1))
    array_depth0 <- cbind(sa_begin, sa_len)

    z=1
    # in case of sub arrays shorten the maped sub-array length
    if(nrow(array_depth0) > 1){

      for(z in 1:(nrow(array_depth0)-1)){
        tmp <- array_depth0
        val <- tmp[z,1] + tmp[z,2]

        while(val >= tmp[z+1,1]){
          array_depth0[z,2] <- array_depth0[z,2] + tmp[z+1,2]
          val <- tmp[z,1] + array_depth0[z,2]
          tmp <- tmp[(z+1)*-1,]
          if(nrow(tmp) <= z) {
            break
          }
        }
      }
    } 


    # map2: for each element of the data the maximum depth
    array_depth <- rep(1, length(inside_array)/2)
    for(z in 1:nrow(array_depth0)){
      pos <- (array_depth0[z,1]+1) : (array_depth0[z,1] + array_depth0[z,2])  
      array_depth[pos] <- array_depth[pos] + 1  
    }


    # add to map1: the array depth for each sub-array  
    array_depth0 <- cbind(array_depth0, NA)
    for(z in 1:nrow(array_depth0)){
      pos <- (array_depth0[z,1]+1) : (array_depth0[z,1] + array_depth0[z,2])
      array_depth0[z,3] <- min(array_depth[pos])
    }


    # create map3: elements of array
    out <- NULL
    for(z in 1:nrow(array_depth0)){
      a_memb <- rep(F, length(array_depth))
      a_memb[(array_depth0[z,1] + 1) : (array_depth0[z,1] + array_depth0[z,2])] <- T 
      out <- cbind(out, a_memb)
    }
    a_memb <- out


    # some uggly fix in map1 and map3: add the whole array
    a_memb <- cbind(a_memb, T)
    array_depth0 <- rbind(array_depth0, c(1,length(value),1))

  } else {
    a_memb <- matrix(T, ncol=1, nrow=length(key))
    array_depth0 <- matrix(c(1,length(key),1), nrow=1)
  }



  # build ####
  # mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm

  # for the deppest array 
  # need prepare for the main loop
  deppest <- which(max(array_depth0[,3])==array_depth0[,3])
  anchor_tmp <- array_depth0[deppest,1]
  content_tmp <- list()
  for(y in 1:ncol(as.matrix(a_memb[,deppest]))){
    pos <- which(as.matrix(a_memb[,deppest])[,y])
    all_content <- as.list(value[pos])
    content_tmp[[y]] <- setNames(all_content, key[pos])
  }
  content_tmp2 <- list()

  if(max(array_depth0[,3]) > 1){

    # construct the list from the deppest array level 
    for(y in (max(array_depth0[,3]) - 1 ) : 1){
      deppest <- which(y==array_depth0[,3])

      content_tmp2 <- list()
      x=1
      for(x in 1:ncol(as.matrix(a_memb[,deppest]))){
        pos <- which(as.matrix(a_memb[,deppest])[,x])
        all_content <- as.list(value[pos])

        if(any(is.na(all_content))){
          pos <- which(array_depth == y & as.matrix(a_memb[,deppest])[,x])
          all_content <- as.list(value[pos])

          for(z in 1:sum(is.na(all_content))){
            pos_a <- pos[is.na(all_content)][1]
            cot <- content_tmp[[which(anchor_tmp==pos_a)]]
            pos_b <- which(is.na(all_content))[1]
            all_content[[pos_b]] <- cot
          }
        }
        content_tmp2[[x]] <- setNames(all_content, key[pos])
      }
      content_tmp <- content_tmp2
      anchor_tmp <- array_depth0[deppest,1]
    }
  } else {
    content_tmp2 <- content_tmp
  }

  return(content_tmp2[[1]])
}
and-bri
  • 1,563
  • 2
  • 19
  • 34
  • The nested version does not handle objects like this: `'a:5:{s:6:"result";a:4:{i:0;a:3:{s:6:"number";s:4:"7717";s:7:"correct";i:1;s:6:"chosen";i:1;}i:1;a:3:{s:6:"number";s:4:"7718";s:7:"correct";i:1;s:6:"chosen";i:1;}i:2;a:3:{s:6:"number";s:4:"7719";s:7:"correct";i:1;s:6:"chosen";i:1;}i:3;a:3:{s:6:"number";s:4:"7720";s:7:"correct";i:0;s:6:"chosen";i:0;}}s:8:"actgrade";d:1;s:5:"retry";i:1;s:10:"retattempt";s:5:"21292";s:4:"cmid";s:3:"985";}'` – user3072843 Oct 26 '22 at 14:27