2

I have a json file that has alot of double quotes inside the values. The json file is almost 27000 records.

I want to remove or replace the double quotes inside the values because otherwise its not accepted as a good json file. How can i do that?

The problem is that there are records with one double quote inside the value but there are also records with multiple quotes inside them.

Instead of replacing or removing the quotes, it is also possible to remove the entire key and value. I am not gonna use it anyway. Is it any easier to do that?

Here is a sample of 1 record in the json file:

 {
  "adlibJSON": {
    "recordList": {
      "record": [
        {
          "@attributes": {
            "priref": "4372",
            "created": "2011-12-09T23:09:57",
            "modification": "2012-08-11T17:07:51",
            "selected": "False"
          },
          "acquisition.date": [
            "1954"
          ],
          "documentation.title": [
            "A lot of text with a lot of extra double quotes like "this" and "this""
          ] ... ...

The problem lies in the value of the key: document.title. I have sublime text 2, which i use to find and replace.

Jeff
  • 12,555
  • 5
  • 33
  • 60
user1386906
  • 1,161
  • 6
  • 26
  • 53
  • I'd try very hard to go back to the source and fix whatever is creating the JSON rather then trying to repair the broken data. – Quentin Jan 29 '13 at 13:35
  • yes, its from a server that i cant manipulate – user1386906 Jan 29 '13 at 13:38
  • 1
    If you can find a way to locate the value itself (e.g. everything inside `[...]`) you could just strip *all* quotes, then put quotes back in around the outside. But if you can do that you may as well just remove the value. – Jeff Jan 29 '13 at 20:45

3 Answers3

1

There is a way, but in order to do that, you must be sure that you can do the following assumptions about your data:

  • "documentation.title" must only appear once in your data, when it is used as a key.
  • the array value referred by "documentation.title" should only have one element.
  • The character "]" should not appear in the value.

Then you would follow those steps:

/* find first index of "[" after "documentation.title" */
n = s.indexOf("[", s.indexOf('"documentation.title"'));

/* Find index of closing "]" */
n2 = s.indexOf("]", n);

/* Get the substring enclosed by these indexes */
x = s.substr(n+1, n2-n-1);

/* Remove every double quotes in this string and rebuild the original string with the corrected value. */
s.substr(0, n) + '["' + x.replace(/"/g, "") + '"]' + s.substr(n2+1);

Edit: if you are not interested in keeping the corrected value itself, you can just replace it by an empty string.

Cyrille Ka
  • 15,328
  • 5
  • 38
  • 58
0

I don't think you can since it's not a regular language.

You'll probably have similar troubles to those incurred by parsing HTML with regex.

I think you'll have to write (or find if you're super lucky) some kind of parser yourself...

Community
  • 1
  • 1
Jeff
  • 12,555
  • 5
  • 33
  • 60
0

Try this:

json.replace(/(^\s*|:\s*)"/gm, '$1[sentinel]')
    .replace(/"(,?\s*$|:)/gm, '[sentinel]$1')
    .replace(/"/g, '\\"').replace(/\[sentinel\]/g, '"');

Demo here: http://jsfiddle.net/D83FD/

This isn't a perfect solution; it's possible that the data could be formatted in such a way that it breaks the regular expression. Try it and see if it works for a larger data set.

Essentially we are finding opening quotes and replacing them with a placeholder value, finding closing quotes and replacing them with the placeholder, backslash-escaping all remaining quotes, and then replacing the placeholders with quotes again.

Dagg Nabbit
  • 75,346
  • 19
  • 113
  • 141