48

As mentioned in this StackOverflow question, you are not allowed to have any trailing commas in json. For example, this

{
    "key1": "value1",
    "key2": "value2"
}

is fine, but this

{
    "key1": "value1",
    "key2": "value2",
}

is invalid syntax.

For reasons mentioned in this other StackOverflow question, using a trailing comma is legal (and perhaps encouraged?) in Python code. I am working with both Python and JSON, so I would love to be able to be consistent across both types of files. Is there a way to have json.loads ignore trailing commas?

Community
  • 1
  • 1
Rob Watts
  • 6,866
  • 3
  • 39
  • 58
  • In short, no. The best practices or preferred approaches for one language have no bearing for the best practices in another. – g.d.d.c May 16 '14 at 22:14
  • 10
    In JSON, it’s invalid, so no, the JSON parser will report that as an invalid format (correct behavior!). If it’s a Python dictionary, you could parse it using `ast.literal_eval`. – poke May 16 '14 at 22:14
  • I don't think so. The json module adheres to the standard. You might be able to hack your own version, though. – daniel kullmann May 16 '14 at 22:20
  • "I am working with both Python and JSON, so I would love to be able to be consistent across both types of files" - one type of file is JSON. What's the other? Python modules? `print`ed Python data structures? – user2357112 May 17 '14 at 03:53
  • 2
    The second example you gave isn't JSON, but it is HOCON. https://github.com/typesafehub/config/blob/master/HOCON.md Kind of makes me want to write a parser for python... – Chris Martin May 17 '14 at 04:10
  • 1
    @ChrisMartin - Hmmm, nope. Skimmed the spec, defines **control-characters** as **from the JSON spec** (big can of " \/worms/ **^I** **^H** "). It also will not accept numbers starting with a decimal (so javascript declarations like `{number: .75, number2: .1E2}` would be invalid. It employs `#` and `//` for comments, but provides no `/* block comment method */`. Other than that, it's awesome. – Orwellophile Jul 17 '16 at 19:38
  • Possible duplicate of [Parsing "JSON" containing trailing commas](https://stackoverflow.com/questions/11052952/parsing-json-containing-trailing-commas) – jpmc26 May 11 '18 at 04:12
  • use yaml instead - see https://stackoverflow.com/a/63555547/9201239 – stason Sep 08 '20 at 22:31
  • 1
    @chris-martin: https://github.com/chimpler/pyhocon – Aiyion.Prime Feb 09 '21 at 10:08

7 Answers7

15

Fast forward to 2021, now we have https://pypi.org/project/json5/

A quote from the link:

A Python implementation of the JSON5 data format.

JSON5 extends the JSON data interchange format to make it slightly more usable as a configuration language:

  • JavaScript-style comments (both single and multi-line) are legal.
  • Object keys may be unquoted if they are legal ECMAScript identifiers
  • Objects and arrays may end with trailing commas.
  • Strings can be single-quoted, and multi-line string literals are allowed.

Usage is consistent with python's built in json module:

>>> import json5
>>> json5.loads('{"key1": "{my special value,}",}')
{u'key1': u'{my special value,}'}

It does come with a warning:

Known issues

  • Did I mention that it is SLOW?

It is fast enough for loading start up config etc.

AnyDev
  • 435
  • 5
  • 16
10

You can wrap python's json parser with jsoncomment

JSON Comment allows to parse JSON files or strings with:

  • Single and Multi line comments
  • Multi line data strings
  • Trailing commas in objects and arrays, after the last item

Example usage:

import json
from jsoncomment import JsonComment

with open(filename) as data_file:    
    parser = JsonComment(json)
    data = parser.load(data_file)
Steve Lorimer
  • 27,059
  • 17
  • 118
  • 213
  • 3
    That package isn't very good. It removes commas from strings as well. Just have a string containing `,}` or `,]` and the commas will magically disappear. – Sven Sep 16 '17 at 12:41
  • As @Sven says, here's a test string to demo it: `{"key1": "{my special value,}"}`. – jpmc26 May 11 '18 at 03:51
  • 2
    @Sven Looks like they upgraded to a proper parse and abandoned regex: https://github.com/vaidik/commentjson/releases – rrauenza Nov 12 '19 at 19:08
  • 1
    If it has its own full-blown JSON parser why not just return the result of its parsing? This is parsing it, re-encoding it then parsing it again. – Arthur Tacca Sep 02 '20 at 12:44
9

Strip the commas before you pass the value in.

import re

def clean_json(string):
    string = re.sub(",[ \t\r\n]+}", "}", string)
    string = re.sub(",[ \t\r\n]+\]", "]", string)

    return string
andrewgrz
  • 435
  • 3
  • 6
  • 2
    This might look okay, but it'll mangle inputs like `'{"foo": ",}"}'`. – user2357112 May 17 '14 at 03:51
  • Technically there would need to be a space in there, but yes, `", }"` would get mangled. – Rob Watts May 17 '14 at 04:53
  • Why have such a complicated regexp? Just use `',\s*}'` and `',\s*]'`. Then again, this method mangles strings wnich is bad. The JsonComment package isn't much better as it uses a method similar to this. – Sven Sep 16 '17 at 12:45
  • 23
    No. You do not parse formats that allow nested elements using regular expressions. -1 – jpmc26 Dec 02 '17 at 12:16
  • @jpmc26 They're not parsing the whole tree, just a single piece of syntax which itself has no nesting. The issue is not the nesting but the matching of the quotes to determine they're not in a string which is possible, but painful. – Cramer May 11 '18 at 01:48
  • 1
    @Cramer It fails on this JSON: `{"key1": "{my special value, }"}` ([tio demo](https://tio.run/##hc7NCsIwDAfwe58i5LIOhyDeBj6JHWPOqNOuLf0Qxuiz10532M0cEv4hP4iZ/EOrY0rDaLT1YImxK92gl9Sp9um04s7bQd3LmkGuX4BTPty7cOFYnUF4YYVqdhErwKWt4i8QzSKajfgSSz5Yte4YM3l4vnmomPFF0wFrwHmcwBnqh07Cu5OBKogYi7JM6QM)). You do *not* make assumptions about the contents of complex formats like JSON. It's always a bad idea. Just don't do it. Use a properly tested parser and save yourself the heartburn. – jpmc26 May 11 '18 at 03:41
  • 1
    Also, `string` is a [standard module](https://docs.python.org/3.6/library/string.html). `s` would be a better variable name. – jpmc26 May 11 '18 at 03:45
  • You didn't read my comment did you? It's the being-inside-a-quote that's the issue. Is there a situation that breaks that is NOT inside a string? – Cramer May 12 '18 at 04:56
9

In python you can have trailing commas inside of dictionaries and lists, so we should be able to take advantage of this using ast.literal_eval:

import ast, json

str = '{"key1": "value1", "key2": "value2",}'

python_obj = ast.literal_eval(str) 
# python_obj is {'key1': 'value1', 'key2': 'value2'}

json_str = json.dumps(python_obj)
# json_str is '{"key1": "value1", "key2": "value2"}'

However, JSON isn't exactly python so there are a few edge cases to this. For example, values like null, true, false don't exist in python. We can replace those with valid python equivalents before we run the eval:

import ast, json

def clean_json(str):
  str = str.replace('null', 'None').replace('true', 'True').replace('false', 'False')
  return json.dumps(ast.literal_eval(str))

This will unfortunately mangle any strings that have the words null, true, or false in them.

{"sentence": "show your true colors"} 

would become

{"sentence": "show your True colors"}
Porkbutts
  • 924
  • 7
  • 12
  • Mangling of the words null, true, or false in a string is one heck of a downside, so thanks for highlighting it. – jarmod Dec 08 '20 at 13:39
4

Cobbling together the knowledge from a few other answers, especially the idea of using literal_eval from @Porkbutts answer, I present a wildly-evil solution to this problem

def json_cleaner_loader(path):
    with open(path) as fh:
        exec("null=None;true=True;false=False;d={}".format(fh.read()))
    return locals()["d"]

This works by defining the missing constants to be their Pythonic values before evaluating the JSON struct as Python code. The structure can then be accessed from locals() (which is yet another dictionary).

This should work with both Python 2.7 and Python 3.x

BEWARE this will execute whatever is in the passed file, which may do anything the Python interpreter can, so it should only ever be used on inputs which are known to be safe (ie. don't let web clients provide the content) and probably not in any production environment.
This probably also fails if it's given a very large amount of content.


Late addendum: A side effect of this (awful) approach is that it supports Python comments within the JSON (JSON-like?) data, though it's hard to compare that to even friendly non-standard behavior.

ti7
  • 16,375
  • 6
  • 40
  • 68
3

Use rapidjson

rapidjson.load("file.json", parse_mode = rapidjson.PM_COMMENTS | rapidjson.PM_TRAILING_COMMAS)
user404906
  • 41
  • 1
  • 2
1

If I don't have the option of using any external module, my typical approach is to first just sanitize the input (i.e. remove the trailing commas and comments) and then use the built-in JSON parser.

Here's an example that uses three regular expressions to strip both single-line and multi-line comments and then trailing commas on the JSON input string then passes it to the built-in json.loads method.

#!/usr/bin/env python

import json, re, sys

unfiltered_json_string = '''
{
    "name": "Grayson",
    "age": 45,
    "car": "A3",
    "flag": false,
    "default": true,
    "entries": [ // "This is the beginning of the comment with some quotes" """""
        "red", // This is another comment. " "" """ """"
        null, /* This is a multi line comment //
"Here's a quote on another line."
*/
        false,
        true,
    ],
    "object": {
        "key3": null,
        "key2": "This is a string with some comment characters // /* */ // /////.",
        "key1": false,
    },
}
'''

RE_SINGLE_LINE_COMMENT = re.compile(r'("(?:(?=(\\?))\2.)*?")|(?:\/{2,}.*)')
RE_MULTI_LINE_COMMENT = re.compile(r'("(?:(?=(\\?))\2.)*?")|(?:\/\*(?:(?!\*\/).)+\*\/)', flags=re.M|re.DOTALL)
RE_TRAILING_COMMA = re.compile(r',(?=\s*?[\}\]])')

if sys.version_info < (3, 5):
    # For Python versions before 3.5, use the patched copy of re.sub.
    # Based on https://gist.github.com/gromgull/3922244
    def patched_re_sub(pattern, repl, string, count=0, flags=0):
        def _repl(m):
            class _match():
                def __init__(self, m):
                    self.m=m
                    self.string=m.string
                def group(self, n):
                    return m.group(n) or ''
            return re._expand(pattern, _match(m), repl)
        return re.sub(pattern, _repl, string, count=0, flags=0)
    filtered_json_string = patched_re_sub(RE_SINGLE_LINE_COMMENT, r'\1', unfiltered_json_string)
    filtered_json_string = patched_re_sub(RE_MULTI_LINE_COMMENT, r'\1', filtered_json_string)
else:
    filtered_json_string = RE_SINGLE_LINE_COMMENT.sub(r'\1', unfiltered_json_string)
    filtered_json_string = RE_MULTI_LINE_COMMENT.sub(r'\1', filtered_json_string)
filtered_json_string = RE_TRAILING_COMMA.sub('', filtered_json_string)

json_data = json.loads(filtered_json_string)
print(json.dumps(json_data, indent=4, sort_keys=True))
Grayson Lang
  • 156
  • 1
  • 5