You've really dug yourself into a very bad hole here.
The problem
The problem you're running into is that a newline (line feed and carriage return characters) are not valid JSON. They must be escaped as \n
and \r
. You can see the full JSON standard here here.
You need to do two things.
Fix your code
In spite of the fact that the JSON standard is comparatively simple, you should not create your JSON by hand. You already know why. You have to handle several edge cases and the like. Your users could enter anything on the page, and you need to make sure that it gets properly encoded no matter what.
You need to use a JSON serialization tool. json_encode
is built in as of 5.2. If you can't use this for any reason, find an existing, widely used (and therefore heavily tested) third party library with a JSON serializer.
If you're asking, "Why can't I create my own serializer?", you could, in theory. Realistically, there is no point. Yours won't be better than existing ones. It will be much more likely to have bugs and to perform worse than something a lot of people have used in production. It will also take much longer to create and test than using an existing one.
If you need this data in code after you pull it back out of the database, then you need a JSON deserializer. json_decode
should also be fine, but again, if you can't use it, look for a widely used third party library.
Fix your data
If you haven't hit production yet, you have really dodged a bullet here, and you can skip this whole section. If you have gone to production and you have data from users, you've got a major problem.
Even after you fix your code, you still have bad data in your production database that won't parse correctly. You have to do something to make this data usable. Unfortunately, it is impossible to automatically recover the original data for every possible case. This is because users might have entered the characters/substrings you added to the data to turn it into "JSON"; for example, they might have entered a comma separated list of quoted words: "dog","cat","pig", and "cow"
. That is an intractable problem, since you know for a fact you didn't properly serialize all your incoming input. There's no way to tell the difference between text your code generated and text the user entered. You're going to have to settle for a best effort and try to throw errors when you can't figure it out in code, and it might mess up a user's data in some special cases. You might have to fix some things manually.
Start by discussing this with your manager, team lead, whoever you answer to. Assuming that you can't lose the data, this is the most sound process to follow for creating a fix for your data:
- Create a database dump of your production data.
- Import that dump into a development database.
- Develop and test your method of repairing this data against the development database from the last step.
- Ensure you have a recovery plan for deployments gone wrong. Test this plan in your testing environment.
- Once you've gone through your typical release process, it's time to release the fixed code and the data update together.
- Take the website offline.
- Back up the database.
- Update the website with the new code.
- Implement your data fix.
- Verify that it worked.
- Bring the site online.
If your data fix doesn't work (possibly because you didn't think of an edge case or something), then you have a nice back up you can restore and you can cancel the release. Then go back to step 1.
As for how you can fix the data, I don't recommend queries here. I recommend a little script tool. It would have to load the data from the database, pull the string apart, try to identify all the pieces, build up an object from those pieces, and finally serialize them to JSON correctly, and put them back into the database.
Here's an example function of how you might go about pulling the string apart:
const ELEMENT_SEPARATOR = '","';
const PAIR_SEPARATOR = '":"';
function recover_object_from_malformed_json($malformed_json, $known_keys) {
$tempData = substr($malformed_json, 14); // Removes {"records":[{" prefix
$tempData = substr($tempData, 0, -4); // Removes "}]} suffix
$tempData = explode(ELEMENT_SEPARATOR, $tempData); // Split into what we think are pairs
$data = array();
$lastKey = NULL;
foreach ($tempData as $i) {
$explodedI = explode(KEY_VALUE_SEPARATOR, $i, 2); // Split what we think is a key/value into key and value
if (in_array($explodedI[0], $known_keys)) { // Check if it's actually a key
// It's a key
$lastKey = $explodedI[0];
if (array_key_exists($lastKey, $data)) {
throw new RuntimeException('Duplicate key: ' + $lastKey);
}
// Assign the value to the key
$data[$lastKey] = $explodedI[1];
}
else {
// This isn't a key vlue pair, near as we can tell
// So it must actually be part of the last value,
// and the user actually entered the delimiter as part of the value.
if (is_null($lastKey)) {
// This one is REALLY messed up
throw new RuntimeException('Does not begin with a known key');
}
$data[$lastKey] += ELEMENT_SEPARATOR;
$data[$lastKey] += $i;
}
}
return $data;
}
Note that I'm assuming that your "list" is a single element. This gets much harder and much messier if you have more than one. You'll also need to know ahead of time what keys you expect to have. The bottom line is that you have to undo whatever your code did to create the "JSON", and you have to do everything you can to try to not mess up a user's data.
You would use it something like this:
$knownKeys = ["Parking", "Description", "Chauffage"];
// Fetch your rows and loop over them
foreach ($dbRows as $row) {
try {
$dataFromDb = $row.myData // or however you would pull out this string.
$recoveredData = recover_object_from_malformed_json($dataFromDb);
// Save it back to the DB
$row.myData = json_encode($recoveredData);
// Make sure to commit here.
}
catch (Exception $e) {
// Log the row's ID, the content that couldn't be fixed, and the exception
// Make sure to roll back here
}
}
(Forgive me if the database stuff looks really wonky. I don't do PHP, so I have no idea how that code should look. Hopefully, you can at least get the concept.)
Why I don't recommend trying to parse your data as JSON to recover it.
The bottom line is that your data in the database is not JSON. IF you try to parse it as such, all the other edge cases you didn't handle properly will get screwed up in the process. You'll see bad things like
\\
becomes \
\j
becomes j
\t
becomes a tab character
In the end, it will just mess up your data even more.
Conclusion
This is a huge mess, and you should never try to convert something into a standard format without using a properly built, well tested serializer. Fixing the data is going to be hard, and it's going to take time. I also seriously doubt you have a lot of background in text processing techniques, and lacking that knowledge is going to make this harder. You can get some good info on text processing by studying how compilers are made. Good luck.