48

When I load a number with e form a JSON dump with YAML, the number is loaded as a string and not a float.

I think this simple example can explain my problem.

import json
import yaml

In [1]: import json

In [2]: import yaml

In [3]: All = {'one':1,'low':0.000001}

In [4]: jAll = json.dumps(All)

In [5]: yAll = yaml.safe_load(jAll)

In [6]: yAll
Out[6]: {'low': '1e-06', 'one': 1}

YAML loads 1e-06 as a string and not as a number? How can I fix it?

Anthon
  • 69,918
  • 32
  • 186
  • 246
Oren
  • 4,711
  • 4
  • 37
  • 63
  • possible duplicate of [Disable scientific notation in python json.dumps output](http://stackoverflow.com/questions/18936554/disable-scientific-notation-in-python-json-dumps-output) – SiHa May 26 '15 at 14:37
  • 3
    @SiHa That might be a way to avoid the issue, but the real problem is that YAML is supposed to be a superset of JSON and '1e-06` as you get out of the `json.dumps()` **is** a correct JSON number and AFAICT also a correct YAML number. PyYAML just doesn't parse it correctly. – Anthon May 26 '15 at 14:56
  • OK, was just a thought... – SiHa May 26 '15 at 15:05
  • 2
    @Oren, I further updated my answer, as the oriiginal pattern I proposed could have a problem matching numbers without dot or exponential part. ruamel.yaml parses these JSON numbers correct without any additional patching. – Anthon May 27 '15 at 10:55
  • 2
    @Oren just edit your yaml file from `1e-3` to `1.0e-3` – Koo Apr 09 '21 at 17:13
  • 1
    Hi @Koo the json was created automatically from a pipeline.. – Oren Apr 09 '21 at 23:11

3 Answers3

43

The problem lies in the fact that the YAML Resolver is set up to match floats as follows:

Resolver.add_implicit_resolver(
    u'tag:yaml.org,2002:float',
    re.compile(u'''^(?:[-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+][0-9]+)?
    |\\.[0-9_]+(?:[eE][-+][0-9]+)?
    |[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\\.[0-9_]*
    |[-+]?\\.(?:inf|Inf|INF)
    |\\.(?:nan|NaN|NAN))$''', re.X),
    list(u'-+0123456789.'))

whereas the YAML spec specifies the regex for scientific notation as:

-? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?

the latter makes the dot optional, which it isn't in the above re.compile() pattern in the implicit resolver.

The matching of floats can be fixed so it will accept floating point values with an e/E but without decimal dot and with exponents without sign (i.e. + implied):

import yaml
import json
import re

All = {'one':1,'low':0.000001}

jAll = json.dumps(All)

loader = yaml.SafeLoader
loader.add_implicit_resolver(
    u'tag:yaml.org,2002:float',
    re.compile(u'''^(?:
     [-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
    |[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
    |\\.[0-9_]+(?:[eE][-+][0-9]+)?
    |[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\\.[0-9_]*
    |[-+]?\\.(?:inf|Inf|INF)
    |\\.(?:nan|NaN|NAN))$''', re.X),
    list(u'-+0123456789.'))

data = yaml.load(jAll, Loader=loader)
print 'data', data

results in:

data {'low': 1e-06, 'one': 1}

There is discrepancy between what JSON allows in numbers and the regex in the YAML 1.2 spec (concerning the required dot in the number and e being lower case). The JSON specification is IMO very clear in that it doesn't require the dot before 'e/E' nor that is requires a sign after the 'e/E':

enter image description here

The PyYAML implementation does match floats partially according to the JSON spec and partially against the regex and fails on numbers that should be valid.

ruamel.yaml (which is my enhanced version of PyYAML), has these updated pattern and works correctly:

import ruamel.yaml
import json

All = {'one':1,'low':0.000001}

jAll = json.dumps(All)

data = ruamel.yaml.load(jAll)
print 'data', data

with output:

data {'low': 1e-06, 'one': 1}

ruamel.yaml also accepts the number '1.0e6', which PyYAML also sees as a string.

Leopd
  • 41,333
  • 31
  • 129
  • 167
Anthon
  • 69,918
  • 32
  • 186
  • 246
  • 4
    If I understand correctly, this is objectively a bug in PyYAML? Have you submitted a pull request fixing it? – Mark Amery Jun 16 '15 at 12:17
  • 5
    @MarkAmery I submitted a PR for PyYAML last year that reintegrated the two code branches (Python2 and Python3) without any form of reaction That project is currently hybernating at best and I'm not going to waste my time on PRs for PyYAML until it reawakens. I later forked and went on with fixes (also some outstanding on PyYAML), because I had to move forward and could no longer wait. I think this is a bug, as it does not implement principle that YAML is superset of JSON nor the exact regex given in the YAML spec. With this change all existing PyYAML unittest passed when I tried. – Anthon Jun 16 '15 at 12:28
  • @MarkAmery To be clear, I much rather had fixed the bugs that I did fix in PyYAML and then fork the source for the extra functionality (not acceptable to PyYAML) and kept the things in sync. – Anthon Jun 16 '15 at 12:32
  • @Anthon Thanks a lot for your efforts. I am doing some scientific calculations with configuration in YAML. Having the possibility to write numbers in scientific notification helps a lot. – dotcs Oct 23 '15 at 07:07
  • @MarkAmery and Anthon it seems this PR solves the issue: https://github.com/yaml/pyyaml/pull/174 – jhagege Jul 18 '18 at 08:58
  • 1
    @cyberjoac Actually it doesn't. One should only apply those rules when parsing YAML 1.2 and not when parsing YAML 1.1. There are also no tests added in that commit that test proper behavior. – Anthon Jul 18 '18 at 09:11
  • How did you draw the image? – Saddle Point Apr 21 '21 at 15:17
  • 1
    Thank you for pointing me to `ruamel.yaml`. Should be used now instead of PyYAML in the Python ecosystem IMO. – Torsten Bronger Sep 18 '21 at 04:27
  • @SaddlePoint I too it from the JSON specification (linked in the anser) and put a simple alpha channel on it making the "outside" transparent. – Anthon Nov 17 '21 at 08:52
24

I think that

1.0e-1

or

1.0E-1

have solve my problem. And my code to read the yaml file is like this

import yaml


def read_config(path: str):
    """read yaml file"""
    with open(path, 'r') as f:
        data = yaml.safe_load(f)
    return data
Jason Lin
  • 341
  • 2
  • 3
11

I am new to using YAML so no idea on what is best, but writing either

1.0e-1

or

1.0E-1

in my YAML file has worked out-of-the-box. That is, have a decimal with the coefficient (without the decimal, I also got strings).

Francisco C
  • 193
  • 3
  • 7