-1

I have an issue when I try to parse my JSON. I create my JSON "by my hand" like this in PHP : $outp ='{"records":['.$outp.']}'; and I create it so I can take field from my database to show them in the page. The thing is, in my database I have a field "description" where people can give a description about something. So some people make return to line like this for example :

Interphone

Equipe:
Canape-lit

Autre:
Local

And when I try to parse my JSON there is an error because of these line's return. "SyntaxError: Unexpected token".

Here's an example of my JSON :

{"records":[{"Parking":"Aucun","Description":"Interphone

Equipé :
Canapé-lit

","Chauffage":"Fioul"}]}

Can someone help me please ?

Bankkay
  • 21
  • 1
  • 6
  • What is "my JSON" in this case, and what's producing it? We don't have nearly enough context here. – Jon Skeet Sep 23 '15 at 16:55
  • Hi, sorry i've add a short example of my JSON it cames from my database when I take all the fields – Bankkay Sep 23 '15 at 17:33
  • 1
    I think a better question is how are you *generating* that JSON before you stuff it in the database? A good JSON serializer should take care of escaping those newlines for you. What tool are you using for serializing and deserializing your JSON? Also, unless you're using built in database JSON features (like PostgreSQL's), the fact you have a database is irrelevant here. (I'm also pretty sure PostgreSQL wouldn't have even allowed you to insert invalid JSON if you had declared the column to be of type `JSON`.) – jpmc26 Sep 23 '15 at 17:34
  • Escape your newlines: http://stackoverflow.com/questions/42068/how-do-i-handle-newlines-in-json. How are you building this json? Any common tools should do this for you. – Chris Farmer Sep 23 '15 at 17:39
  • This is not to store thing in my database but to take them from database to a JSON so I can show them in a page. The JSON is created "by hand" with something like this "$outp ='{"records":['.$outp.']}';" (in php) – Bankkay Sep 23 '15 at 17:43
  • Don't do it by hand. Find a library. – Chris Farmer Sep 23 '15 at 17:46
  • So you think the best thing i've to do is to do everything again to have a "good" JSON ? – Bankkay Sep 23 '15 at 17:47
  • 2
    There are lots of opportunities to screw up your homemade json. This can amount to anything from inconvenience to hard to find bugs to security issues. Just find a library for your platform that is common and reliable. You will be happier. – Chris Farmer Sep 23 '15 at 17:50
  • What's your database (MySQL, PostgreSQL, SQL Server, Oracle)? – jpmc26 Sep 23 '15 at 17:53
  • Thanks a lot I think i'm gonna do that this will be better. My database is MySQL, I made the JSON in PHP to send it to IONIC Framework – Bankkay Sep 23 '15 at 18:57

1 Answers1

1

You've really dug yourself into a very bad hole here.

The problem

The problem you're running into is that a newline (line feed and carriage return characters) are not valid JSON. They must be escaped as \n and \r. You can see the full JSON standard here here.

You need to do two things.

Fix your code

In spite of the fact that the JSON standard is comparatively simple, you should not create your JSON by hand. You already know why. You have to handle several edge cases and the like. Your users could enter anything on the page, and you need to make sure that it gets properly encoded no matter what.

You need to use a JSON serialization tool. json_encode is built in as of 5.2. If you can't use this for any reason, find an existing, widely used (and therefore heavily tested) third party library with a JSON serializer.

If you're asking, "Why can't I create my own serializer?", you could, in theory. Realistically, there is no point. Yours won't be better than existing ones. It will be much more likely to have bugs and to perform worse than something a lot of people have used in production. It will also take much longer to create and test than using an existing one.

If you need this data in code after you pull it back out of the database, then you need a JSON deserializer. json_decode should also be fine, but again, if you can't use it, look for a widely used third party library.

Fix your data

If you haven't hit production yet, you have really dodged a bullet here, and you can skip this whole section. If you have gone to production and you have data from users, you've got a major problem.

Even after you fix your code, you still have bad data in your production database that won't parse correctly. You have to do something to make this data usable. Unfortunately, it is impossible to automatically recover the original data for every possible case. This is because users might have entered the characters/substrings you added to the data to turn it into "JSON"; for example, they might have entered a comma separated list of quoted words: "dog","cat","pig", and "cow". That is an intractable problem, since you know for a fact you didn't properly serialize all your incoming input. There's no way to tell the difference between text your code generated and text the user entered. You're going to have to settle for a best effort and try to throw errors when you can't figure it out in code, and it might mess up a user's data in some special cases. You might have to fix some things manually.

Start by discussing this with your manager, team lead, whoever you answer to. Assuming that you can't lose the data, this is the most sound process to follow for creating a fix for your data:

  1. Create a database dump of your production data.
  2. Import that dump into a development database.
  3. Develop and test your method of repairing this data against the development database from the last step.
  4. Ensure you have a recovery plan for deployments gone wrong. Test this plan in your testing environment.
  5. Once you've gone through your typical release process, it's time to release the fixed code and the data update together.
  6. Take the website offline.
  7. Back up the database.
  8. Update the website with the new code.
  9. Implement your data fix.
  10. Verify that it worked.
  11. Bring the site online.

If your data fix doesn't work (possibly because you didn't think of an edge case or something), then you have a nice back up you can restore and you can cancel the release. Then go back to step 1.

As for how you can fix the data, I don't recommend queries here. I recommend a little script tool. It would have to load the data from the database, pull the string apart, try to identify all the pieces, build up an object from those pieces, and finally serialize them to JSON correctly, and put them back into the database.

Here's an example function of how you might go about pulling the string apart:

const ELEMENT_SEPARATOR = '","';
const PAIR_SEPARATOR = '":"';

function recover_object_from_malformed_json($malformed_json, $known_keys) {
    $tempData = substr($malformed_json, 14); // Removes {"records":[{" prefix
    $tempData = substr($tempData, 0, -4); // Removes "}]} suffix
    $tempData = explode(ELEMENT_SEPARATOR, $tempData); // Split into what we think are pairs

    $data = array();
    $lastKey = NULL;

    foreach ($tempData as $i) {
        $explodedI = explode(KEY_VALUE_SEPARATOR, $i, 2); // Split what we think is a key/value into key and value
        if (in_array($explodedI[0], $known_keys)) { // Check if it's actually a key
            // It's a key
            $lastKey = $explodedI[0];
            if (array_key_exists($lastKey, $data)) {
                throw new RuntimeException('Duplicate key: ' + $lastKey);
            }
            // Assign the value to the key
            $data[$lastKey] = $explodedI[1];
        }
        else {
            // This isn't a key vlue pair, near as we can tell
            // So it must actually be part of the last value,
            // and the user actually entered the delimiter as part of the value.
            if (is_null($lastKey)) {
                // This one is REALLY messed up
                throw new RuntimeException('Does not begin with a known key');
            }

            $data[$lastKey] += ELEMENT_SEPARATOR;
            $data[$lastKey] += $i;
        }
    }

    return $data;
}

Note that I'm assuming that your "list" is a single element. This gets much harder and much messier if you have more than one. You'll also need to know ahead of time what keys you expect to have. The bottom line is that you have to undo whatever your code did to create the "JSON", and you have to do everything you can to try to not mess up a user's data.

You would use it something like this:

$knownKeys = ["Parking", "Description", "Chauffage"];

// Fetch your rows and loop over them
foreach ($dbRows as $row) {
    try {
        $dataFromDb = $row.myData // or however you would pull out this string.
        $recoveredData = recover_object_from_malformed_json($dataFromDb);

        // Save it back to the DB
        $row.myData = json_encode($recoveredData);
        // Make sure to commit here.
    }
    catch (Exception $e) {
        // Log the row's ID, the content that couldn't be fixed, and the exception
        // Make sure to roll back here
    }
}

(Forgive me if the database stuff looks really wonky. I don't do PHP, so I have no idea how that code should look. Hopefully, you can at least get the concept.)

Why I don't recommend trying to parse your data as JSON to recover it.

The bottom line is that your data in the database is not JSON. IF you try to parse it as such, all the other edge cases you didn't handle properly will get screwed up in the process. You'll see bad things like

  • \\ becomes \
  • \j becomes j
  • \t becomes a tab character

In the end, it will just mess up your data even more.

Conclusion

This is a huge mess, and you should never try to convert something into a standard format without using a properly built, well tested serializer. Fixing the data is going to be hard, and it's going to take time. I also seriously doubt you have a lot of background in text processing techniques, and lacking that knowledge is going to make this harder. You can get some good info on text processing by studying how compilers are made. Good luck.

jpmc26
  • 28,463
  • 14
  • 94
  • 146
  • I don't disagree with anything you've said here. But it could be that he inherited this design/db, or he is trying to prototype something, or solve an interview question. He may not have the means to overhaul everything at this point, and I was suggesting things that he could try despite the existing infrastructure. – ergonaut Sep 23 '15 at 23:49
  • @ergonaut If I were conducting an interview and posed this situation, I would be horrified at someone suggesting a simple regex replacement. I would be looking for them to recognize the scale of the problem and realize that fixing it can easily break some user data and that we need to be super careful to break as little as possible. The way I read the question, the OP explicitly states that they are the ones responsible. I also try to suggest something that can be done, but I don't feel your answer properly addresses the scope of the problem and the inherent risks and impossibilities. – jpmc26 Sep 24 '15 at 00:22
  • @ergonaut I understand that's a little harsh, but computers are very unforgiving. Additionally, StackOverflow explicitly asks users to provide content that fully addresses the issue. If we simplify too much, then other users who come along won't be helped, and the OP won't learn. – jpmc26 Sep 24 '15 at 00:27
  • Thanks for everything you said. I find a way. I create in JSON, in that JSON there are JWT (which contains my databases fields, some in a JSON). Everything works fine and it's OP for security i think – Bankkay Sep 25 '15 at 21:42