16

Can anyone suggest a JSON parser that allows any kind of comments, with PHP bindings - need the comments badly for config files but json_decode doesn't support them.

(I am aware of: 1. other format such as YAML, 2. comments are not part of the standard)

Update:

Why don't we use:

  • YAML: Benchmarks show it's slower - and we might want to send the data over the wire - not sure if YAML is best for that.

  • XML: too verbose - simple human editing is a requirement. And no need for the extended features of XML.

  • INI: there is hierarchy and nesting of variable depth in the data. And we need a ubiquitous format as the data might be distributed with apps or work with apps in other languages.

  • Pre-processing: data can be contributed and shared by users, tough to impose a requirement to pre-process before adding data to an app.

Basel Shishani
  • 7,735
  • 6
  • 50
  • 67
  • 1
    What are the comments for? Human or computer reading? – heldt Nov 16 '11 at 08:43
  • 2
    You could run the JSON string through a pre-processing parser which strips all comments – knittl Nov 16 '11 at 08:44
  • 1
    I can only second Gordon. Use a format which can give you what you expect from it, instead of raping another format which is just not for that purpose in its current state. – kapa Nov 16 '11 at 08:50
  • If they're for config files, then why not use parse_ini_file compatible .ini files? or create your config files in PHP itself? Both support comments and would be cheaper to parse than JSON using methods other than json_decode. – GordonM Nov 16 '11 at 08:55
  • 2
    @bazmegakapa: It's not raping - it's consensual:) The standard does allow for parsers to support extensions. – Basel Shishani Nov 16 '11 at 09:19
  • 1
    @heldt: comments are for humans - for now - in any case, we would like the parser to ignore them gracefully. – Basel Shishani Nov 16 '11 at 09:40
  • [json5](https://github.com/json5/json5), end of – aross Dec 02 '21 at 14:23

5 Answers5

11

YAML

If you need portability and don't want any pre-processing or non-standard syntax, then YAML is probably the way to go. Though, beware of the dangers and caveats of YAML.

Most, if not all, of JSON is compatible with YAML (YAML is a superset of JSON), and it supports comments. So the initial switch is easy.

JSON with comments

I recently needed to migrate from INI files in PHP to something that has support for integers and booleans, but still supported comments as well.

JSON seemed like a good format, except for supporting comments. If you want to make this work, you don't need a whole custom JSON parser. It can be made to work with simple wrapper that strips the comments uses the native json_decode after that. (This works for sane content that trusted people author. If you allow crazy inputs there is probably a way to break this.)

Code from github.com/countervandalism/stillalive, with the regex from @makaveli_lcf:

class JsonUtil {
    /**
     * From https://stackoverflow.com/a/10252511/319266
     * @return array|false
     */
    public static function load( $filename ) {
        $contents = @file_get_contents( $filename );
        if ( $contents === false ) {
            return false;
        }
        return json_decode( self::stripComments( $contents ), true );
    }
    /**
     * From https://stackoverflow.com/a/10252511/319266
     * @param string $str
     * @return string
     */
    protected static function stripComments( $str ) {
        return preg_replace( '![ \t]*//.*[ \t]*[\r\n]!', '', $str );
    }
}
Timo Tijhof
  • 10,032
  • 6
  • 34
  • 48
  • 2
    What are the leading and trailing `!` in the regex for? – John Archer Mar 04 '20 at 08:43
  • 2
    @JohnArcher Good question! The `!` bangs here serve as the regex delimiters – taking the place of the traditional `/` slashes. My regex here includes slashes itself, which means those would otherwise need to be escaped. You might also sometimes see `#`, `%` or `~` used as delimiters. See also [php.net/regexp.reference.delimiters](https://www.php.net/regexp.reference.delimiters). – Timo Tijhof Mar 05 '20 at 23:11
  • That is interesting, thanks a lot. I tried this on regex101.com, but it does not work there. See this example: https://regex101.com/r/q6SFFT/1 If I fix the "unescaped delimiter" problem the site does not recognize `!` (or other delimiters) as delimiters but matches the character literally. Is this because the site simply does not support this, but PHP's `preg_replace()` does? – John Archer Mar 06 '20 at 08:22
  • 1
    @JohnArcher The delimiters are not part of the pattern, they are only an encoding and do not change the behaviour of the regex. The website at your link has the delimiters already set (in light grey). This means a bang `!` within that, would be treated as literal part of the pattern, not as delimiter. Effectively, your link is running `/!…!/`, instead of `!…!` or `/…/`. Similarly, if I were to enter `/` there, it would also fail. As that would be `//…//` instead of `/…/`. See https://regex101.com/r/8Smo5h/1 and https://regex101.com/r/NVY6f1/1. – Timo Tijhof Mar 06 '20 at 19:33
  • 4
    Note that the regular expression in this post doesn't handle strings that have comments embedded in them; `{ "hello": "world // blah" }` would replace to `{ "hello": "world`, which is invalid JSON. – Ruby Tunaley Feb 05 '21 at 04:17
10

You can use the following function to decode commented json:

function json_decode_commented($data, $assoc = false, $maxDepth = 512, $opts = 0) {
  $data = preg_replace('~
    (" (?:\\\\. | [^"])*+ ") | \# [^\v]*+ | // [^\v]*+ | /\* .*? \*/
  ~xs', '$1', $data);

  return json_decode($data, $assoc, $maxDepth, $opts);
}

It supports all PHP-style comments: /*, #, //. String values are preserved as is.

Alexander Shostak
  • 673
  • 10
  • 11
  • 2
    This is a better solution than the accepted one. btw, I've created a package and use this solution than my own: unional/jsonc – unional Feb 22 '21 at 00:00
  • Actually, I notice `| \# [^\v]*+` is not necessary and breaks certain cases. Is there a reason to have that? – unional Feb 22 '21 at 00:10
  • @unional, the only reason is to support # ..... comment till line end .... It should not break anything, because strings are already excluded from processing. – Alexander Shostak Feb 22 '21 at 16:18
  • I have a test case that doesn’t work. You can try it on the just-func.jsonc in GitHub justland/just-func under the json-schema folder. – unional Feb 22 '21 at 16:53
  • @unional, thank you for bug reporting. Changed the code to improve escape character handling: \ will be skipped in string literals now. – Alexander Shostak Feb 22 '21 at 20:21
  • You are welcome. Is it common to use `#` for comment in jsonc? I use VSCode and also as in json (JavaScript), only `//` and `/* */` are acceptable comment formats. Btw, I have created a jsonc package and used your implementation instead of the one I have. Will add your name on it for attribution. :) https://github.com/unional/jsonc-php – unional Feb 22 '21 at 20:30
  • It's a bash/python style comments. You may remove "| \# [^\v]*+" part to exclude their support. Anyway, it may be better to support all-style comments in generic function. Thanks. – Alexander Shostak Feb 22 '21 at 20:39
  • I found myself needing to modify VSCode workspace files programmatically, it would be wonderful if there was a way to retain the comments – bilogic Dec 16 '22 at 07:09
  • I think `$data` should be renamed as `$json`, correct? – bilogic Dec 16 '22 at 07:19
  • You can give it any name. It's a data string in json format, right. – Alexander Shostak Apr 23 '23 at 02:32
3

I'm surprised nobody mentioned json5

{
  // comments
  unquoted: 'and you can quote me on that',
  singleQuotes: 'I can use "double quotes" here',
  lineBreaks: "Look, Mom! \
No \\n's!",
  hexadecimal: 0xdecaf,
  leadingDecimalPoint: .8675309, andTrailing: 8675309.,
  positiveSign: +1,
  trailingComma: 'in objects', andIn: ['arrays',],
  "backwardsCompatible": "with JSON",
}
aross
  • 3,325
  • 3
  • 34
  • 42
2

Another option is to allow your users to insert comments as unused fields in the JSON structure:

{
  "color": "red",
  "color//": "may be red, green or blue"
}

If you only use your JSON for input, and it's never machine-saved, you could abuse the format to use the same field repeatedly, incidentally achieving a near-wipe of the comments when parsing (as usually only the first or the last value of a field will be retained in a parsed structure):

{
  "color": "red",      "//":"may be red, green or blue",
  "shape": "circle",   "//":"use circle, square or triangle",
  "timeout": 5,        "//":"timeout in seconds; default is 10"
}
1

Comments are not part of JSON, so a "JSON parser" is not required to accept comments..

I'd use YAML. Even if parsing is slightly slower (PHP has a native JSON parser but no native YAML parser) it's probably neglectible and if it's not, you can always cache the parsed object. Besides that, since the PHP JSON parser does not support comments you'd have to use a non-native one, i.e. it most likely wouldn't be faster than the YAML parser (assuming both are well-written)

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • 4
    A JSON encoder MUST NOT output comments. A JSON decoder MAY accept and ignore comments. http://blog.getify.com/2010/06/json-comments/ . If the feature is useful there's no harm in siding with the designer of the spec. – Basel Shishani Nov 16 '11 at 09:54