1

Ok i have this:

a:1:{i:0;a:3:{s:7:"address";s:52:"Elågåresgude 41, 2200 Københamm N";s:12:"company_name";s:14:"Kaffe og Kluns";s:9:"telephone";s:0:"";}}

This does not work with unserialize($string);

I know where the error is. It's the number in front os the address. It should not be 52, but 36.

I got to this number by counting the string (which gave me 33) and then plus with 1 on each å or ø that exists in the string.

When i replace 52 with 36, will it unseralize just fine.

Now i would like to write a script to do this for all my addresses.

But how can i even do this? Extract the address/company_name/telephone string, when its "corrupted"?

Karem
  • 17,615
  • 72
  • 178
  • 278
  • 1
    You could json_encode instead.... – Flukey Nov 13 '11 at 18:06
  • 1
    @Flukey: This is not JSON, OP wants to **deserialize**, not decode from JSON. – Tadeck Nov 13 '11 at 18:08
  • How do you get those corrupted strings? You should rather focus on the source of the problem than the error itself, if possible that is. – Marcus Nov 13 '11 at 18:11
  • @Tadeck - Yes, I know. However, you are less likely to run into these problems of serialization if use you use json_encode instead. Furthermore, using json is better for portability. If OP has a db with lots of serialized php arrays and then one day he decides to switch over to python, he'll be somewhat.....screwed. – Flukey Nov 13 '11 at 18:14
  • 1
    Looks like you may have an encoding issue, check out http://php.net/manual/en/function.unserialize.php and search for 'utf-8', there is a user contributed function that may help. – Mike Purcell Nov 13 '11 at 18:19
  • no problem here : http://codepad.viper-7.com/lLtAc0 – malletjo Nov 13 '11 at 18:28

4 Answers4

4
function fix_corrupted_serialized_string($string) {
    $tmp = explode(':"', $string);
    $length = count($tmp);
    for($i = 1; $i < $length; $i++) {    
        list($string) = explode('"', $tmp[$i]);
        $str_length = strlen($string);    
        $tmp2 = explode(':', $tmp[$i-1]);
        $last = count($tmp2) - 1;    
        $tmp2[$last] = $str_length;         
        $tmp[$i-1] = join(':', $tmp2);
    }
    return join(':"', $tmp);
}

working demo: http://codepad.viper-7.com/GNbM25

Peter
  • 16,453
  • 8
  • 51
  • 77
  • Good for the "length" problem, but like Marcus said, he should focus on the source of the problem. – malletjo Nov 13 '11 at 18:18
  • There is question about length problem, and there is answer :) Anyways `serialize()` is only good for cache. I hate people who store data in this format (for ex.: 80% wordpress plugins) – Peter Nov 13 '11 at 18:23
  • Thank you this worked prima! @racar i have already fixed the source of the problem – Karem Nov 13 '11 at 18:36
  • So now instead of trying to resolve the root of the problem, you are going to add the overhead of an extra function? – Mike Purcell Nov 13 '11 at 18:48
  • @DigitalPrecision eh no i have resolved the root of the problem, but the damage the problem did, is what i was looking for to resolve/fix. Which I have now, thanks to peters answer. Sorry forgot to accept – Karem Nov 13 '11 at 19:15
  • @Karem: Ah. Glad you were able to fix the root problem. – Mike Purcell Nov 13 '11 at 19:23
0

This problem is a classic case of someone trying to perform a shortcut when updating a value in a serialized string. The lesson swiftly learned to avoid this headache is to unserialize your data, modify your value(s), then re-serialize it.

I feel regular expressions afford a more direct approach for trying to parse the corrupted serialized string. To be perfectly clear, my snippet will only update the byte/character counts; if you have a serialized string that is corrupted by some other means, this will not be the remedy.

Here is a simple preg_replace_callback() call that only captures the value substring and unconditionally replaces all byte counts in the serialized string:

Code: (Demo)

$corrupted_byte_counts = <<<STRING
a:1:{i:0;a:3:{s:7:"address";s:52:"Elågåresgude 41, 2200 Københamm N";s:12:"company_name";s:14:"Kaffe og Kluns";s:9:"telephone";s:0:"";}}
STRING;

$repaired = preg_replace_callback(
        '/s:\d+:"(.*?)";/s',
        function ($m) {
            return 's:' . strlen($m[1]) . ":\"{$m[1]}\";";
        },
        $corrupted_byte_counts
    );

echo "corrupted serialized array:\n$corrupted_byte_counts";
echo "\n---\n";
echo "repaired serialized array:\n$repaired";
echo "\n---\n";
print_r(unserialize($repaired));

Output:

corrupted serialized array:
a:1:{i:0;a:3:{s:7:"address";s:52:"Elågåresgude 41, 2200 Københamm N";s:12:"company_name";s:14:"Kaffe og Kluns";s:9:"telephone";s:0:"";}}
---
repaired serialized array:
a:1:{i:0;a:3:{s:7:"address";s:36:"Elågåresgude 41, 2200 Københamm N";s:12:"company_name";s:14:"Kaffe og Kluns";s:9:"telephone";s:0:"";}}
---
Array
(
    [0] => Array
        (
            [address] => Elågåresgude 41, 2200 Københamm N
            [company_name] => Kaffe og Kluns
            [telephone] => 
        )

)

I've even gone a bit further to address a possible fringe case. Without implementing the pattern extension in that link, the above snippet will work as desired on strings with:

  • multibyte characters
  • newlines
  • colons
  • semicolons
  • commas
  • single quotes
  • double quotes

It only breaks when a string to be matched contains "; -- in which case, my above link attempts to address that possibility.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
0

Looks like a bug in the function in dealing with multi-byte characters. You might also want to try explicitly encoding the string as utf-8 before serializing it.

As a workaround, you could base64 encode the address before serializing it, then base64 decode it when you unserialize it.

Homer6
  • 15,034
  • 11
  • 61
  • 81
-1

I think one solution should be to test if unserialize worked. If not, delete it and reserialize it.

$yourserializestring = '...';

$data = @unserialize($yourserializestring);
if ($yourserializestring === 'b:0;' || $data !== false) {
    // Something didn't work, you should recreate it
} else {
    echo "ok";
}
Community
  • 1
  • 1
malletjo
  • 1,766
  • 16
  • 18