21

I have a massive multidimensional array that has been serialised by PHP. It has been stored in MySQL and the data field wasn't large enough... the end has been cut off... I need to extract the data... unserialize wont work... does anyone know of a code that can close all the arrays... recalculate string lengths... it's too much data to do by hand.

Many thanks.

fabrik
  • 14,094
  • 8
  • 55
  • 71
Simon
  • 5,158
  • 8
  • 43
  • 65
  • 1
    This may be a useful resource for some people finding this question - I've used it many times and it's worked well every time: https://github.com/Blogestudio/Fix-Serialization (granted this would likely not help where a large portion of the string has been cut off - only when you've done a search and replace and the string lengths are off) – But those new buttons though.. Jun 02 '16 at 18:10

15 Answers15

38

This is recalculating the length of the elements in a serialized array:

$fixed = preg_replace_callback(
    '/s:([0-9]+):\"(.*?)\";/',
    function ($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";';     },
    $serialized
);

However, it doesn't work if your strings contain ";. In that case it's not possible to fix the serialized array string automatically -- manual editing will be needed.

Emil M
  • 1,082
  • 12
  • 18
24

Solution:

1) try online:

Serialized String Fixer (online tool)

2) Use function:

unserialize( serialize_corrector($serialized_string ) ) ;

code:

function serialize_corrector($serialized_string){
    // at first, check if "fixing" is really needed at all. After that, security checkup.
    if ( @unserialize($serialized_string) !== true &&  preg_match('/^[aOs]:/', $serialized_string) ) {
        $serialized_string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s',    function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; },   $serialized_string );
    }
    return $serialized_string;
} 

there is also this script, which i haven't tested.

T.Todua
  • 53,146
  • 19
  • 236
  • 237
20

I have tried everything found in this post and nothing worked for me. After hours of pain here's what I found in the deep pages of google and finally worked:

function fix_str_length($matches) {
    $string = $matches[2];
    $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
    return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
    // securities
    if ( !preg_match('/^[aOs]:/', $string) ) return $string;
    if ( @unserialize($string) !== false ) return $string;
    $string = preg_replace("%\n%", "", $string);
    // doublequote exploding
    $data = preg_replace('%";%', "µµµ", $string);
    $tab = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%', 'fix_str_length', $line);
    }
    return $new_data;
}

You call the routine as follows:

//Let's consider we store the serialization inside a txt file
$corruptedSerialization = file_get_contents('corruptedSerialization.txt');

//Try to unserialize original string
$unSerialized = unserialize($corruptedSerialization);

//In case of failure let's try to repair it
if(!$unSerialized){
    $repairedSerialization = fix_serialized($corruptedSerialization);
    $unSerialized = unserialize($repairedSerialization);
}

//Keep your fingers crossed
var_dump($unSerialized);
Roman Newaza
  • 11,405
  • 11
  • 58
  • 89
Mishu Vlad
  • 281
  • 3
  • 6
4

Following snippet will attempt to read & parse recursively damaged serialized string (blob data). For example if you stored into database column string too long and it got cut off. Numeric primitives and bool are guaranteed to be valid, strings may be cut off and/or array keys may be missing. The routine may be useful e.g. if recovering significant (not all) part of data is sufficient solution to you.

class Unserializer
{
    /**
    * Parse blob string tolerating corrupted strings & arrays
    * @param string $str Corrupted blob string
    */
    public static function parseCorruptedBlob(&$str)
    {
        // array pattern:    a:236:{...;}
        // integer pattern:  i:123;
        // double pattern:   d:329.0001122;
        // boolean pattern:  b:1; or b:0;
        // string pattern:   s:14:"date_departure";
        // null pattern:     N;
        // not supported: object O:{...}, reference R:{...}

        // NOTES:
        // - primitive types (bool, int, float) except for string are guaranteed uncorrupted
        // - arrays are tolerant to corrupted keys/values
        // - references & objects are not supported
        // - we use single byte string length calculation (strlen rather than mb_strlen) since source string is ISO-8859-2, not utf-8

        if(preg_match('/^a:(\d+):{/', $str, $match)){
            list($pattern, $cntItems) = $match;
            $str = substr($str, strlen($pattern));
            $array = [];
            for($i=0; $i<$cntItems; ++$i){
                $key = self::parseCorruptedBlob($str);
                if(trim($key)!==''){ // hmm, we wont allow null and "" as keys..
                    $array[$key] = self::parseCorruptedBlob($str);
                }
            }
            $str = ltrim($str, '}'); // closing array bracket
            return $array;
        }elseif(preg_match('/^s:(\d+):/', $str, $match)){
            list($pattern, $length) = $match;
            $str = substr($str, strlen($pattern));
            $val = substr($str, 0, $length + 2); // include also surrounding double quotes
            $str = substr($str, strlen($val) + 1); // include also semicolon
            $val = trim($val, '"'); // remove surrounding double quotes
            if(preg_match('/^a:(\d+):{/', $val)){
                // parse instantly another serialized array
                return (array) self::parseCorruptedBlob($val);
            }else{
                return (string) $val;
            }
        }elseif(preg_match('/^i:(\d+);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (int) $val;
        }elseif(preg_match('/^d:([\d.]+);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (float) $val;
        }elseif(preg_match('/^b:(0|1);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (bool) $val;
        }elseif(preg_match('/^N;/', $str, $match)){
            $str = substr($str, strlen('N;'));
            return null;
        }
    }
}

// usage:
$unserialized = Unserializer::parseCorruptedBlob($serializedString);
lubosdz
  • 4,210
  • 2
  • 29
  • 43
3

Using preg_replace_callback(), instead of preg_replace(.../e) (because /e modifier is deprecated).

$fixed_serialized_String = preg_replace_callback('/s:([0-9]+):\"(.*?)\";/',function($match) {
    return "s:".strlen($match[2]).':"'.$match[2].'";';
}, $serializedString);

$correct_array= unserialize($fixed_serialized_String);
T.Todua
  • 53,146
  • 19
  • 236
  • 237
M Rostami
  • 4,035
  • 1
  • 35
  • 39
2

Best Solution for me:

$output_array = unserialize(My_checker($serialized_string));

code:

function My_checker($serialized_string){
    // securities
    if (empty($serialized_string))                      return '';
    if ( !preg_match('/^[aOs]:/', $serialized_string) ) return $serialized_string;
    if ( @unserialize($serialized_string) !== false ) return $serialized_string;

    return
    preg_replace_callback(
        '/s\:(\d+)\:\"(.*?)\";/s', 
        function ($matches){  return 's:'.strlen($matches[2]).':"'.$matches[2].'";';  },
        $serialized_string )
    ;
}
T.Todua
  • 53,146
  • 19
  • 236
  • 237
0

Based on @Emil M Answer Here is a fixed version that works with text containing double quotes .

function fix_broken_serialized_array($match) {
    return "s:".strlen($match[2]).":\"".$match[2]."\";"; 
}
$fixed = preg_replace_callback(
    '/s:([0-9]+):"(.*?)";/',
    "fix_broken_serialized_array",
    $serialized
);
Kamal Saleh
  • 479
  • 1
  • 5
  • 20
0

[UPD] Colleagues, I'm not very sure if it is allowed here, but specially for similar cases I've created own tool and 've placed it on own website. Please, try it https://saysimsim.ru/tools/SerializedDataEditor

[Old text] Conclusion :-) After 3 days (instead of 2 estimated hours) migrating blessed WordPress website to a new domain name, I've finally found this page!!! Colleagues, please, consider it as my "Thank_You_Very_Much_Indeed" to all your answers. The code below consists of all your solutions with almost no additions. JFYI: personally for me the most often SOLUTION 3 works. Kamal Saleh - you are the best!!!

function hlpSuperUnSerialize($str) {
    #region Simple Security
    if (
        empty($str)
        || !is_string($str)
        || !preg_match('/^[aOs]:/', $str)
    ) {
        return FALSE;
    }
    #endregion Simple Security

    #region SOLUTION 0
    // PHP default :-)
    $repSolNum = 0;
    $strFixed  = $str;
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 0

    #region SOLUTION 1
    // @link https://stackoverflow.com/a/5581004/3142281
    $repSolNum = 1;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):\"(.*?)\";/',
        function ($matches) { return "s:" . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
        $str
    );
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 1

    #region SOLUTION 2
    // @link https://stackoverflow.com/a/24995701/3142281
    $repSolNum = 2;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):\"(.*?)\";/',
        function ($match) {
            return "s:" . strlen($match[2]) . ':"' . $match[2] . '";';
        },
        $str);
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 2

    #region SOLUTION 3
    // @link https://stackoverflow.com/a/34224433/3142281
    $repSolNum = 3;
    // securities
    $strFixed = preg_replace("%\n%", "", $str);
    // doublequote exploding
    $data     = preg_replace('%";%', "µµµ", $strFixed);
    $tab      = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback(
            '%\bs:(\d+):"(.*)%',
            function ($matches) {
                $string       = $matches[2];
                $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count

                return 's:' . $right_length . ':"' . $string . '";';
            },
            $line);
    }
    $strFixed = $new_data;
    $arr      = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 3

    #region SOLUTION 4
    // @link https://stackoverflow.com/a/36454402/3142281
    $repSolNum = 4;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):"(.*?)";/',
        function ($match) {
            return "s:" . strlen($match[2]) . ":\"" . $match[2] . "\";";
        },
        $str
    );
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 4

    #region SOLUTION 5
    // @link https://stackoverflow.com/a/38890855/3142281
    $repSolNum = 5;
    $strFixed  = preg_replace_callback('/s\:(\d+)\:\"(.*?)\";/s', function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; }, $str);
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 5

    #region SOLUTION 6
    // @link https://stackoverflow.com/a/38891026/3142281
    $repSolNum = 6;
    $strFixed  = preg_replace_callback(
        '/s\:(\d+)\:\"(.*?)\";/s',
        function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
        $str);;
    $arr = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 6
    error_log('Completely unable to deserialize.');

    return FALSE;
}
0

we had some issues with this as well. At the end, we used a modified version of roman-newaza which also works for data containing linebreaks.

<?php 


$mysql = mysqli_connect("localhost", "...", "...", "...");
$res = mysqli_query($mysql, "SELECT option_id,option_value from ... where option_value like 'a:%'");

$prep = mysqli_prepare($mysql, "UPDATE ... set option_value = ? where option_id = ?");


function fix_str_length($matches) {
    $string = $matches[2];
    $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
    return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
    if ( !preg_match('/^[aOs]:/', $string) ) return $string;
    if ( @unserialize($string) !== false ) return $string;
    $data = preg_replace('%";%', "µµµ", $string);
    $tab = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%s', 'fix_str_length', $line);
    }
    return $new_data;
}

while ( $val = mysqli_fetch_row($res) ) {
  $y = $val[0];
  $x = $val[1];

  $unSerialized = unserialize($x);

  //In case of failure let's try to repair it
  if($unSerialized === false){
      echo "fixing $y\n";
      $repairedSerialization = fix_serialized($x);
      //$unSerialized = unserialize($repairedSerialization);
      mysqli_stmt_bind_param($prep, "si", $repairedSerialization, $y);
      mysqli_stmt_execute($prep);
  }

}
Jonas Hünig
  • 101
  • 2
0

Top vote answer does not fix serialized array with unquoted string value such as a:1:{i:0;s:2:14;}

function unserialize_corrupted(string $str): array {
    // Fix serialized array with unquoted strings
    if(preg_match('/^(a:\d+:{)/', $str)) {
        preg_match_all('/(s:\d+:(?!").+(?!");)/U', $str, $pm_corruptedStringValues);

        foreach($pm_corruptedStringValues[0] as $_corruptedStringValue) {
            // Get post string data
            preg_match('/^(s:\d+:)/', $_corruptedStringValue, $pm_strBase);

            // Get unquoted string
            $stringValue = substr($_corruptedStringValue, strlen($pm_strBase[0]), -1);
            // Rebuild serialized data with quoted string
            $correctedStringValue = "$pm_strBase[0]\"$stringValue\";";

            // replace corrupted data
            $str = str_replace($_corruptedStringValue, $correctedStringValue, $str);
        }
    }

    // Fix offset error
    $str = preg_replace_callback(
        '/s:(\d+):\"(.*?)\";/',
        function($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";'; },
        $str
    );

    $unserializedString = unserialize($str);

    if($unserializedString === false) {
        // Return empty array if string can't be fixed
        $unserializedString = array();
    }

    return $unserializedString;
}
Preciel
  • 2,666
  • 3
  • 20
  • 45
0

Based on @Preciel's solution, fix objects too

public function unserialize(string $string2array): array {
    if (preg_match('/^(a:\d+:{)/', $string2array)) {
        preg_match_all('/((s:\d+:(?!").+(?!");)|(O:\d+:(?!").+(?!"):))/U', $string2array, $matches);
        foreach ($matches[0] as $match) {
            preg_match('/^((s|O):\d+:)/', $match, $strBase);
            $stringValue = substr($match, strlen($strBase[0]), -1);
            $endSymbol = substr($match, -1);
            $fixedValue = $strBase[2] . ':' . strlen($stringValue) . ':"' . $stringValue . '"' . $endSymbol;
            $string2array = str_replace($match, $fixedValue, $string2array);
        }
    }

    $string2array = preg_replace_callback(
        '/(a|s|b|d|i):(\d+):\"(.*?)\";/',
        function ($matches) {
            return $matches[1] . ":" . strlen($matches[3]) . ':"' . $matches[3] . '";';
        },
        $string2array
    );

    $unserializedString = (!empty($string2array) && @unserialize($string2array)) ? unserialize($string2array) : array();
    return $unserializedString;
}
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 28 '22 at 05:03
-2

I doubt anyone would write code to retrieve partially saved arrays:) I fixed a thing like this once but by hand and it took hours, and then i realized i don't need that part of the array...

Unless its really important data(and i mean REALLY important) you'd be better to leave this one go

Quamis
  • 10,924
  • 12
  • 50
  • 66
-3

You can return invalid serialized data back to normal, by way of an array :)

str = "a:1:{i:0;a:4:{s:4:\"name\";s:26:\"20141023_544909d85b868.rar\";s:5:\"dname\";s:20:\"HTxRcEBC0JFRWhtk.rar\";s:4:\"size\";i:19935;s:4:\"dead\";i:0;}}"; 

preg_match_all($re, $str, $matches);

if(is_array($matches) && !empty($matches[1]) && !empty($matches[2]))
{
    foreach($matches[1] as $ksel => $serv)
    {
        if(!empty($serv))
        {
            $retva[] = $serv;
        }else{
            $retva[] = $matches[2][$ksel];
        }
    }

    $count = 0;
    $arrk = array();
    $arrv = array();
    if(is_array($retva))
    {
        foreach($retva as $k => $va)
        {
            ++$count;
            if($count/2 == 1)
            {
                $arrv[] = $va;
                $count = 0;
            }else{
                $arrk[] = $va;
            }
        }
        $returnse = array_combine($arrk,$arrv);
    }

}

print_r($returnse);
Mike Kormendy
  • 3,389
  • 2
  • 24
  • 21
Mahran Elneel
  • 192
  • 3
  • 17
-4

Serializing is almost always bad because you can't search it in any way. Sorry, but it seems as though you're backed into a corner...

Ben
  • 60,438
  • 111
  • 314
  • 488
-5

I think this is almost impossible. Before you can repair your array you need to know how it is damaged. How many childs missing? What was the content?

Sorry imho you can't do it.

Proof:

<?php

$serialized = serialize(
    [
        'one'   => 1,
        'two'   => 'nice',
        'three' => 'will be damaged'
    ]
);

var_dump($serialized); // a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:15:"will be damaged";}

var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"tee";s:15:"will be damaged";}')); // please note 'tee'

var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:')); // serialized string is truncated

Link: https://ideone.com/uvISQu

Even if you can recalculate length of your keys/values, you cannot trust the data retrieved from this source, because you cannot recalculate the value of these. Eg. if the serialized data is an object, your properties won't be accessible anymore.

fabrik
  • 14,094
  • 8
  • 55
  • 71
  • Ultimately, it is not your answer that is wrong, it is the question that is Unclear / Cannot be reproduced. I got sucked in by all of the other answers that dropped in byte count adjusting snippets and didn't read the question well enough. This page should be closed and I should find a better home for my answer. My apologies for poking your old post. – mickmackusa Apr 08 '19 at 01:23
  • I don't necessarily agree. the scenario op described is a real one. given some serialized data which has been damaged, and op wanted to know if there's any way to fix it using regular expressions. I still think there's no way to do that, (at least not with regular expressions) because it'd be a guesswork – fabrik Apr 08 '19 at 03:34
  • Yeah. Okay, again, I think you are right. I think your answer is the only correct answer on the page. I'll be removing mine when I get a chance. Your downvote tally is misleading. Perhaps you could rephrase your wording so that it doesn't look like you are asking questions. – mickmackusa Apr 08 '19 at 03:38
  • that's a good idea, I'll do it when I'll be at my desk. thanks – fabrik Apr 08 '19 at 03:39
  • hey @mickmackusa, just added some more details to my answer. – fabrik Apr 08 '19 at 13:30
  • Your demo does not show truncation. Your demo shows mid-string value modification that can be fixed by one of the simple preg calls posted below. Looks like I can't get lubosdz's answer to work out-of-the-box on a simple example. https://3v4l.org/8AObr – mickmackusa Apr 08 '19 at 21:24
  • then you have even less chance to restore – fabrik Apr 08 '19 at 21:26
  • @mickmackusa sorry, but what are you doing with this question? editing title/description of an already answered (and don't get me wrong, I don't care about meaningless internet points) question is plain wrong, you're faking the context. why do you feel you need to do this? – fabrik Apr 09 '19 at 07:57
  • I have added clarity to the question because there are so many answers that do not understand the malformation of the incoming serialized string. You see, the answers that are correcting the byte count will NEVER work. Furthermore, people use old pages to close new pages. It is important to clarify what this page is supposed to solve so that new closures are not misappropriated. – mickmackusa Apr 09 '19 at 08:00
  • @mickmackusa no, you are vandalizing content, and after all adding the very same answer I've added before. what's your point? – fabrik Apr 09 '19 at 08:04
  • There is absolutely no vandalism here. Since I had to explain how your mock data was inappropriate, I am starting to think that you don't understand the issue. I have made the question more clear and provided a new title that will not mislead volunteers and researchers. I'll actually delete my answer and only undelete it after I have developed a repairing function. – mickmackusa Apr 09 '19 at 10:03
  • My point is to make this page clear for researchers that are trying to salvage their own truncated serialized string and clear for volunteers that want to try to answer. This page is horribly bloated with answers that will not possibly help researchers with corruption via truncation. – mickmackusa Apr 09 '19 at 10:03
  • @mickmackusa how my example is inappropriate? what's the difference in a missing string chain/truncated serialized data? nothing, since both will return false in the end. also, what kind of researchers are you talking about? will you cast a delete vote on all answers except yours? where is your example? how is your answer is different than mine or others? chill, man – fabrik Apr 09 '19 at 10:45
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/191528/discussion-between-mickmackusa-and-fabrik). – mickmackusa Apr 09 '19 at 11:48