I know this is an old question, but better late than never, I suppose. I ran into this problem recently, after inheriting a database that had had a find/replace executed on serialized data. After many hours of researching, I discovered that this was because the string counts were off. Unfortunately, there was so much data with lots of escaping and newlines and I didn't know how to count in some cases and I had so much data that I needed something automated.
Along the way, I stumbled across this question and Benubird's post helped put me on the right path. His example code did not work in production use on complex data, containing numerous special characters and HTML, with very deep levels of nesting, and it did not properly handle certain escaped characters and encoding. So I modified it a bit and spent countless hours working through additional bugs to get my version to "fix" the serialized data.
// do some DB query here
while($res = db_fetch($qry)){
$str = $res->data;
$sCount=1; // don't try to count manually, which can be inaccurate; let serialize do its thing
$newstring = unserialize($str);
if(!$newstring) {
preg_match_all('/s:([0-9]+):"(.*?)"(?=;)/su',$str,$m);
# preg_match_all("/s:([0-9]+):(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")(?=;)/u",$str,$m); // alternate: almost works but leave quotes in $m[2] output
# print_r($m); exit;
foreach($m[1] as $k => $len) {
/*** Possibly specific to my case: Spyropress Builder in WordPress ***/
$m_clean = str_replace('\"','"',$m[2][$k]); // convert escaped double quotes so that HTML will render properly
// if newline is present, it will output directly in the HTML
// nl2br won't work here (must find literally; not with double quotes!)
$m_clean = str_replace('\n', '<br />', $m_clean);
$m_clean = nl2br($m_clean); // but we DO need to convert actual newlines also
/*********************************************************************/
if($sCount){
$m_new = $m[0][$k].';'; // we must account for the missing semi-colon not captured in regex!
// NOTE: If we don't flush the buffers, things like <img src="http://whatever" can be replaced with <img src="//whatever" and break the serialize count!!!
ob_end_flush(); // not sure why this is necessary but cost me 5 hours!!
$m_ser = serialize($m_clean);
if($m_new != $m_ser) {
print "Replacing: $m_new\n";
print "With: $m_ser\n";
$str = str_replace($m_new, $m_ser, $str);
}
}
else{
$m_len = (strlen($m[2][$k]) - substr_count($m[2][$k],'\n'));
if($len != $m_len) {
$newstr='s:'.$m_len.':"'.$m[2][$k].'"';
echo "Replacing: {$m[0][$k]}\n";
echo "With: $newstr\n\n";
$str = str_replace($m_new, $newstr, $str);
}
}
}
print_r($str); // this is your FIXED serialized data!! Yay!
}
}
A little geeky explanation on my changes:
- I found that trying to count with Benubird's code as a base was too inaccurate for large datasets, so I ended up just using serialize to be sure the count was accurate.
- I avoided the try/catch because, in my case, the try would succeed but just returned an empty string. So, I check for empty data instead.
- I tried numerous regex's but only a mod on Benubird's would accurately handle all cases. Specifically, I had to modify the part that checked for the ";" because it would match on CSS like "width:100%; height:25px;" and broke the output. So, I used a positive lookahead to only match when the ";" was outside of the set of double quotes.
- My case had lots of newlines, HTML, and escaped double quotes, so I had to add a block to clean that up.
- There were a couple of weird situations where data would be replaced incorrectly by the regex and then the serialize would count it incorrectly as well. I found NOTHING on any sites to help with this and finally thought it might be related to caching or something like that and tried flushing the output buffer (ob_end_flush()), which worked, thank goodness!
Hope this helps someone... Took me almost 20 hours including the research and dealing with weird issues! :)