5

I have the offset error like many on the website after storing object into database and than retrieving it. If I don't store it everything works fine:

$serializedObject = serialize($this);
$unSerializedObject = unserialize($serializedObject);

Besides, I use base64 encoding when saving data and retrieving it from database, but this doesn't help. I don't do any escaping though. My object processes some string. What I've found out is that with this string:

A woman is travelling around the world. She is 28 years old and she is from Great Britain.
She cannot use a car or a plane on her

It works fine. But when I add one more space and word [ journey], the error pops up. Here is the string with this one word:

A woman is travelling around the world. She is 28 years old and she is from Great Britain.
She cannot use a car or a plane on her journey

My question is why does the error pop up?

Here is the output of the serialize($this) run against the text without the word journey

Here is the output of the serialize($this) run against the text with the word journey

UPDATE

The table I'm saving object into has charset utf-8 and the column with no charset defined since it's of BLOB type. The mb_detect_encoding(serialize($this)) returns UTF-8

There is no escaping for the $sql. This is how the query is executed inside Kohana framework that I'm using:

$result = mysql_query($sql, $this->_connection)
Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488
  • This function check only simple variable length, it does not work well with private/public properties of an object. – dev-null-dweller Oct 19 '13 at 18:24
  • @dev-null-dweller, so what you're saying it shows incorrect data if my object has private properties? – Max Koretskyi Oct 19 '13 at 18:25
  • @Maximus, you mentioned that you are using Kohana. Are you saving it via an object that extends a Kohana_ORM record? Also, could you paste in the contents of `echo serialize($this)` and `echo bin2hex(serialize($this))`. (via something http://pastebin.com/ if it's really big.) – Matthew Oct 20 '13 at 15:58
  • @Matthew, no I'm doing it manually `DB::query(Database::INSERT, $sql)->execute();`. [Here is](http://pastebin.com/1MZCEwd2) the pasted serialized object. – Max Koretskyi Oct 20 '13 at 16:03
  • @Maximus, thanks, could you also do the bin2hex version? (I had edited my comment to include that.) – Matthew Oct 20 '13 at 16:07
  • @Matthew, done [here](http://pastebin.com/4kaCUmCh). Hope you're not using the information in the question (output of the function), since the data I pasted is different – Max Koretskyi Oct 20 '13 at 16:12
  • @Maximums, thanks, I'll take a look at it later. I'm familiar with Kohana, so I'll see what it takes to get it to work. I assume the problem you are having is that this type of serialized data has null bytes (i.e., `chr(0)`) and it is not being stored/retrieved properly. The base64 encoding just works around that issue because the DB is then only storing alphanumeric characters (plus a few safe symbols). – Matthew Oct 20 '13 at 16:17
  • @Matthew, thanks! base64 encoding doesn't help with the data I pasted – Max Koretskyi Oct 20 '13 at 16:19
  • @Matthew, updated my question with new findings and new data – Max Koretskyi Oct 20 '13 at 17:33
  • @Maximums, I've updated my answer with some example Kohana code. Hopefully it helps; there's not much more I can add to the discussion. – Matthew Oct 20 '13 at 17:50
  • @Matthew, I increased the size of the field to the MEDIUMBLOB and that was it! It worked! Greatly appreciate your help! The only thing now is to figure out why was it failing. It says offset 49151 of 49151 bytes, so it hasn't reach it's potential of 65535 bytes, right?. – Max Koretskyi Oct 20 '13 at 17:59
  • @Matthew, maybe it's because the string is stored as UTF-8 characters? – Max Koretskyi Oct 20 '13 at 18:06
  • 1
    @Maximus, yes a UTF8 char may take 2 or more bytes when stored in a blob/text. (This is not true of a CHAR/VARCHAR where the size is in characters. Well, it still consumes those extra bytes, but that is accounted for in the VARCHAR's length.) – Matthew Oct 20 '13 at 21:32

2 Answers2

8

Original answer:

A TEXT field in MySQL stores up to 65535 bytes, so my guess is it is being truncated there.

Use a MEDIUMTEXT or LONGTEXT instead.

In addition to that, there are potential issues with how you get the data into and out of the database. PHP serialized strings can contain null bytes (the byte 0) and that appears to be what is not getting transfered properly.

One way to work around that is to encode the string via something like base64_encode() that uses a very friendly alphanumeric/symbol alphabet. That will solve your problems if you increase your BLOB type to MEDIUMBLOB or LONGBLOB.

However, if you are properly sending your queries to the database you can safely send the original string. Since you are using Kohana, here is a sample that works perfectly fine for me.

Short version:

$sql = 'INSERT INTO serialized_object (data) VALUES (:data)';
DB::query(Database::INSERT, $sql)->
  param(':data', $serialization)->
  execute();

Code:

<?php 
class Article {}
class Word {}

class Controller_Welcome extends Controller
{
    public function action_index()
    {
        $object = unserialize(hex2bin(file_get_contents('/tmp/data.hex')));
        $serialization = serialize($object);

        $sql = 'INSERT INTO serialized_object (data) VALUES (:data)';
        DB::query(Database::INSERT, $sql)->
            param(':data', $serialization)->
            execute();

        $saved_length = DB::query(Database::SELECT, '
            SELECT LENGTH(data) AS l
            FROM serialized_object
            ORDER BY id DESC
            LIMIT 1
        ')->execute()->get('l');

        if ($saved_length != strlen($serialization))
        {
            throw new Exception("Database length is incorrect. Value is corrupted in database.");
        }

        $saved_serialization = DB::query(Database::SELECT, '
            SELECT data
            FROM serialized_object
            ORDER BY id DESC
            LIMIT 1
        ')->execute()->get('data');

        $saved_object = unserialize($saved_serialization);

        if (!$saved_object)
        {
            throw new Exception("Unable to unserialize object.");
        }

        if ($saved_object != $object)
        {
            throw new Exception("Saved object is not equal to object.");
        }

        $this->response->body('Everything is fine.');
    }

} 

database.php:

<?php

return array
(
    'default' => array(
        'type'       => 'PDO',
        'connection' => array(
            /**
             * The following options are available for PDO:
             *
             * string   dsn         Data Source Name
             * string   username    database username
             * string   password    database password
             * boolean  persistent  use persistent connections?
             */
            'dsn'        => 'mysql:host=127.0.0.1;dbname=test',
            'username'   => 'root',
            'password'   => '****',
            'persistent' => FALSE,
        ),
        /**
         * The following extra options are available for PDO:
         *
         * string   identifier  set the escaping identifier
         */
        'table_prefix' => '',
        'charset'      => 'utf8',
        'caching'      => FALSE,
    ),
);

Schema:

CREATE TABLE `serialized_object` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` longblob NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Matthew
  • 47,584
  • 11
  • 86
  • 98
  • Thanks for your reply, but the problems where detected before I even stored the object into database. I'm using a BLOB field in database, if your are talking about the field type. – Max Koretskyi Oct 19 '13 at 18:11
  • 1
    Just for information: `TINYTEXT` stores 254 Bytes, `TEXT` stores 64KiB minus 2 bytes, `MEDIUMTEXT` stores 16 MiB minus 3 bytes and `LONGTEXT` stores 4 GiB minus 4 bytes. – Cobra_Fast Oct 19 '13 at 18:12
  • A blob has the same storage size. Maybe you aren't properly escaping your data when saving to the database. I don't see any evidence that your unserialize is failing outside using a database. – Matthew Oct 19 '13 at 19:27
3

The problem is fixed with this approach:

$toDatabse = base64_encode(serialize($data));  // Save to database
$fromDatabase = unserialize(base64_decode($data)); //Getting Save Format 

But the question remains, what are the root causes for errors found by Baba's function findSerializeError?

Community
  • 1
  • 1
Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488
  • This implies that you are not saving the data properly to the database. `base64_encode` only uses "safe" characters which is probably working around a faulty encoding that is happening when you go into the database. – Matthew Oct 19 '13 at 20:29
  • @Matthew, So how do I figure out what the problem is? – Max Koretskyi Oct 20 '13 at 07:29
  • Update your question with 2 things: how you connect to the database (password/username is irrelevant) and how you send the query. I would guess that either the charset is wrong or you aren't escaping the data (e.g., `mysql_real_escape_string`). – Matthew Oct 20 '13 at 14:56
  • @Matthew, I updated my question with information regarding charset. I'm using Kohana framework, so it takes some time for me to figure out how it connects to database and it sends the query. You're right, I'm not escaping the data, where should I do it? – Max Koretskyi Oct 20 '13 at 15:49