Is there a way to determine whether a file is in YAML or JSON format?

Question

I have a Python test script that requires a configuration file. The configuration file is expected to be in JSON format.

But some of the users of my test script dislike the JSON format because it's unreadable.

So I changed my test script so that it expects the configuration file in YAML format, then converts the YAML file to a JSON file.

I would prefer that the function that loads the configuration file to handle both JSON and YAML. Is there a method in either the yaml or json module that can give me a Boolean response if the configuration file is JSON or YAML?

My workaround right now is to use two try/except clauses:

import os
import json
import yaml

# This is the configuration file - my script gets it from argparser but in
# this example, let's just say it is some file that I don't know what the format
# is
config_file = "some_config_file"

in_fh = open(config_file, "r")

config_dict = dict()
valid_json = True
valid_yaml = True

try:
    config_dict = json.load(in_fh)
except:
    print "Error trying to load the config file in JSON format"
    valid_json = False

try:
    config_dict = yaml.load(in_fh)
except:
    print "Error trying to load the config file in YAML format"
    valid_yaml = False

in_fh.close()

if not valid_yaml and not valid_json:
    print "The config file is neither JSON or YAML"
    sys.exit(1)

Now, there is a Python module I found on the Internet called isityaml that can be used to test for YAML. But I'd prefer not to install another package because I have to install this on several test hosts.

Does the json and yaml module have a method that gives me back a Boolean that tests for their respective formats?

config_file = "sample_config_file"

# I would like some method like this
if json.is_json(in_fh):
    config_dict = json.load(in_fh)

Isn't YAML a superset of JSON? You should be able to just load the file as YAML unconditionally. (I'm not sure whether it's an exact superset - I think previous versions weren't.) — user2357112, Jun 03 '17 at 00:02
Couldn't you just require that YAML files have one extension and JSON files have a different one? — user2357112, Jun 03 '17 at 00:03
user2357112, there are two problems. 1) Some users might name their configuration file without a .yml or .json suffix so I can't go by the suffix in their configuration file 2) Just because a file has a .yml suffix doesn't necessarily mean that the file is in the YAML format. — SQA777, Jun 03 '17 at 01:55
user2357112, I tested loading a json file using yaml.load and loading a yaml file using json.load and both asserted (this was outside a try/except block) — SQA777, Jun 03 '17 at 01:57
**do not use** PyYAML's `load()`, on uncontrolled data, it is *unsafe* (i.e. you can get your disc wiped). — Anthon, Jun 03 '17 at 10:30
@user2357112 That only applies to YAML 1.2, and the OP is using PyYAML (deducted from the use of `import yaml`) and that doesn't support YAML 1.2, only 1.1 — Anthon, Jun 03 '17 at 10:36

Anthon · Answer 1 · 2020-03-18T19:16:05.950

From your

import yaml

I conclude that you use the old PyYAML. That package only supports YAML 1.1 (from 2005) and the format specified there is not a full superset of JSON. With the YAML 1.2 (released 2009), the YAML format became a superset of JSON.

The package ruamel.yaml (disclaimer: I am the author of that package) supports YAML 1.2. You can install it in your python virtual enviroment with pip install ruamel.yaml. And by replacing PyYAML by ruamel.yaml (and not adding a package), you can just do:

import os
from ruamel.yaml import YAML

config_file = "some_config_file"

yaml = YAML()
with open(config_file, "r") as in_fh:
    config_dict = yaml.load(in_fh)

and load the file into config_dict, not caring about whether the input is YAML or JSON and no need for having a test for either format.

Josh Kelley · Accepted Answer · 2017-06-04T03:07:54.227

4

From looking at the json and yaml modules' documentation, it looks like they don't offer any appropriate modules. However, a common Python idiom is EAFP ("easier to ask forgiveness than permission"); in other words, go ahead and try to do the operation, and deal with exceptions if they arise.

def load_config(config_file):
    with open(config_file, "r") as in_fh:
        # Read the file into memory as a string so that we can try
        # parsing it twice without seeking back to the beginning and
        # re-reading.
        config = in_fh.read()

    config_dict = dict()
    valid_json = True
    valid_yaml = True

    try:
        config_dict = json.loads(config)
    except:
        print "Error trying to load the config file in JSON format"
        valid_json = False

    try:
        config_dict = yaml.safe_load(config)
    except:
        print "Error trying to load the config file in YAML format"
        valid_yaml = False

You could make your own is_json or is_yaml function if you wanted. This would involve processing the configuration twice, but that may be okay for your purposes.

def try_as(loader, s, on_error):
    try:
        loader(s)
        return True
    except on_error:
        return False

def is_json(s):
    return try_as(json.loads, s, ValueError)

def is_yaml(s):
    return try_as(yaml.safe_load, s, yaml.scanner.ScannerError)

Finally, as @user2357112 alluded to, "every JSON file is also a valid YAML file" (as of YAML 1.2), so you should be able to unconditionally process everything as YAML (assuming you have a YAML 1.2-compatible parser; Python's default yaml module isn't).

edited Jun 04 '17 at 03:07

answered Jun 03 '17 at 02:38

Josh Kelley

56,064
19
146
246

Your last statement doesn't really apply, as the OPs `import yaml` refers to PyYAML which only supports the older YAML 1.1 specification. – Anthon Jun 03 '17 at 10:35
Doing `try` and `except` without specifying the exceptions is bad practice. The exception to catch in the case of JSON is **`ValueError**. The yaml module doesn't even raise exceptions. – Ricardo Branco Jun 04 '17 at 02:11
UnicodeDecodeError is a subclass of UnicodeError which itself is a subclass of ValueError. When your script catches Ctrl-C (ExceptionError), for example, you wouldn't want the `except` block in your load_config() function to handle that. – Ricardo Branco Jun 04 '17 at 03:01
I tried `with open("/etc/passwd") as f: d = yaml.load_safe(f)` and no exception was triggered. Anyway, I would use yaml.scanner.ScannerError as an exception to catch. – Ricardo Branco Jun 04 '17 at 03:02
@RicardoBranco - /etc/passwd is parsed as a single YAML string. `yaml.safe_load('a: b: c')` will throw an exception. – Josh Kelley Jun 04 '17 at 03:10
Thanks for the tip. – Ricardo Branco Jun 04 '17 at 03:11
Thanks @RicardoBranco. I will specify the exceptions in the except clause. – SQA777 Jun 07 '17 at 06:12

score 0 · Answer 3 · answered Aug 29 '20 at 16:49

After years I met the same trouble. I fully agree with EAFP, but still I'm trying find the best detection if the configuration file is in JSON format or YAML. In code I have methods which inform user where he did issue in json-file and where in YAML. try/except did not handle this as I really want, and my eyes are bleeding when I see those nested blocks.

This is not perfect, still has minor issues, but for me, the basic concept fits my needs. I'd say "good enough".

My solution is: find all possible standalone commas in configuration file. If config file contains standalone commas (separators in json) we have json-file, if we do not find any commas, it's yaml. In my yaml-file I use commas only in comments (between " ") and in lists (between [ ]). Maybe someone will find it usefull.

import re
from pathlib import Path

commas = re.compile(r',(?=(?![\"]*[\s\w\?\.\"\!\-\_]*,))(?=(?![^\[]*\]))')
"""
Find all commas which are standalone 
 - not between quotes - comments, answers
 - not between brackets - lists
"""
file_path = Path("example_file.cfg")
signs = commas.findall(file_path.open('r').read())

return "json" if len(signs) > 0 else "yaml"

score 0 · Answer 4 · answered Sep 11 '22 at 23:13

I don't know if this has been answered already, but here is a way to do it

def input_parameters(file):
default_ext = '.json' #set a default extension
file_ext = pathlib.Path(file).suffix
with open(file, 'r') as f:
    if file_ext == default_ext:
        input_file = json.load(f)
    else:
        input_file = yaml.safe_load(f)
return input_file

Is there a way to determine whether a file is in YAML or JSON format?

4 Answers4

Linked

Related