1

I am stashing chunks of log data in memcache to later throw into a database. On each request to the server I save an array of data using memcached::append(), using newlines to delimit the chunks. A simplified version would look like this:

$myCache->append('log', serialize($myArray)."\n");

Later when I want to build may query I pull all the rows out of the database and unserialize each one:

$dataToInsert = explode("\n", $myCache->get('log'));
$dataToInsert = array_map(function($row) {
    return unserialize($row);
}, $dataToInsert);

This works fine with the built-in serialize() and unserialize(), but I'd like to take advantage of igbinary's obvious strengths - size and speed. Unfortunately when I substitute the igbinary versions of the functions, I get errors.

It appears that the igbinary-serialized data can contain "\n" characters, so when I explode the stashed data it creates partial rows that of course fail.

Is there a delimiter that I can use besides newline to separate the blocks of igbinary data, or are igbinary and append() fundamentally incompatible?

Jerry
  • 3,391
  • 1
  • 19
  • 28
  • You could try an underscore `_`, just because it naturally isn't a delimiter so may work... not tested and not done it before but probs worth a try. – An0nC0d3r Oct 26 '15 at 21:38
  • I was a little imprecise when I said the binary data contains newline characters; since it's binary it doesn't have characters at all. But explode() will interpret any `0A` as a newline. I guess my question boils down to, 'Is there any byte combination that will not be in the igbinary data that I can explode on? Or perhaps a method besides explode() to use to retrieve the discrete data blocks? – Jerry Oct 26 '15 at 22:18

1 Answers1

1

Since igbinary stores binary data as-is, there is no guarantee of any character being available for use: you can serialize a string or integer containing any byte, any character.

memcached supports adding, removing, and replacing data, and updating strings.

Two ways to keep the logged data out of memory and in memcached until the SQL query come to mind:

  • use multiple keys: 'log1', ..., 'logN' and keep track of N.
  • reserve a character for yourself by escaping the binary output of the serialization (and unescaping before deserialization).

The reservation could be done like this:

str_replace( "\n", "\n1", $data ) . "\n0"

This will make sure that every time there a \n in the output, it is followed by either a 0 or a 1.

I'm not replacing \n with \n\n because this won't work well if $data starts or ends with \n.

So:

$myCache->append('log', str_replace("\n", "\n1", igbinary_serialize($myArray)."\n0");

Splitting the data is then done using \n0, and the \n1 is unescaped back to \n:

$dataToInsert = explode("\n0", $myCache->get('log'));
$dataToInsert = array_map(function($row) {
    return igbinary_unserialize(str_replace("\n1", "\n", $row));
}, $dataToInsert);
Kenney
  • 9,003
  • 15
  • 21
  • Perfect. Thanks! I had started to write code to length-prefix each blob of binary data, but this feels nicer. – Jerry Oct 27 '15 at 01:07
  • That would probably be faster and more efficient. I think your solution is the best answer :-) – Kenney Oct 27 '15 at 02:50