32

I got a .json file (named it meta.json) like this:

{
    "main": {
        "title": "今日は雨が降って",
        "description": "今日は雨が降って"
    }
}

I would like to convert it to a .yaml file (named it meta.yaml) like :

title: "今日は雨が降って"
description: "今日は雨が降って"

What I have done was :

import simplejson as json
import pyyaml

f = open('meta.json', 'r')
jsonData = json.load(f)
f.close()

ff = open('meta.yaml', 'w+')
yamlData = {'title':'', 'description':''}
yamlData['title'] = jsonData['main']['title']
yamlData['description'] = jsonData['main']['description']
yaml.dump(yamlData, ff)
# So you can  see that what I need is the value of meta.json     

But sadly, what I got is following:

{description: "\u4ECA\u65E5\u306F\u96E8\u304C\u964D\u3063\u3066", title: "\u4ECA\u65E5\
\u306F\u96E8\u304C\u964D\u3063"}

Why?

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
holys
  • 13,869
  • 15
  • 45
  • 50

5 Answers5

41

pyyaml.dump() has an allow_unicode option that defaults to None (all non-ASCII characters in the output are escaped). If allow_unicode=True, then it writes raw Unicode strings.

yaml.dump(data, ff, allow_unicode=True)

Bonus

You can dump JSON without encoding as follows:

json.dump(data, outfile, ensure_ascii=False)
Christopher Peisert
  • 21,862
  • 3
  • 86
  • 117
shoma
  • 618
  • 5
  • 9
18

This works for me:

#!/usr/bin/env python

import sys
import json
import yaml

print(yaml.dump(json.load(open(sys.argv[1])), default_flow_style=False))

So what we are doing is:

  1. load json file through json.loads
  2. json loads in unicode format - convert that to string by json.dump
  3. load the yaml through yaml.load
  4. dump the same in a file through yaml.dump - default_flow_style - True displays data inline, False doesn't do inline - so you have dumpable data ready.

Takes care of unicode as per How to get string objects instead of Unicode from JSON?

mrucci
  • 4,342
  • 3
  • 33
  • 35
Saurabh Hirani
  • 1,198
  • 14
  • 21
  • 1
    Since loaded json and yaml files both work with python dicts internally, simpy doing `print(yaml.dump(json.load(open(sys.argv[1]))))` does the same thing. Works for python 3. – jaaq Apr 22 '20 at 07:06
  • How do you get this to actually indent lists though? Especially after they follow a key. – FilBot3 Dec 07 '20 at 19:37
3
In [1]: import json, yaml

In [2]: with open('test.json') as js:
   ...:     data = json.load(js)[u'main']
   ...:     

In [3]: with open('test.yaml', 'w') as yml:
   ...:     yaml.dump(data, yml, allow_unicode=True)
   ...:     

In [4]: ! cat test.yaml
{!!python/unicode 'description': 今日は雨が降って, !!python/unicode 'title': 今日は雨が降って}

In [5]: with open('test.yaml', 'w') as yml:
   ...:     yaml.safe_dump(data, yml, allow_unicode=True)
   ...:     

In [6]: ! cat test.yaml
{description: 今日は雨が降って, title: 今日は雨が降って}
root
  • 76,608
  • 25
  • 108
  • 120
3

I do simply:

#!/usr/bin/env python
import sys
import json
import yaml

yaml.safe_dump(json.load(sys.stdin), sys.stdout, default_flow_style=False)
Mitar
  • 6,756
  • 5
  • 54
  • 86
  • Thanks! Using this with `yaml.safe_dump(...)` instead prevents the addition of the python unicode tags, but this converted nicely into a oneliner for me! – Cinderhaze Nov 19 '20 at 22:30
  • 1
    You are right. Since then there is `safe_dump`. :-) I updated the answer. – Mitar Nov 19 '20 at 22:58
2

This is correct. The "\u...." strings are unicode representation of your Japanese? string. When you decode and use it with proper encoding, it should display fine wherever you use it. eg a webpage.

See the equality of data inspite of different representation as string :

>>> import json
>>> j = '{    "main": {        "title": "今日は雨が降って",        "description": "今日は雨が降って"    }}'
>>> s = json.loads(j)
>>> t = json.dumps(s)
>>> j
'{    "main": {        "title": "\xe4\xbb\x8a\xe6\x97\xa5\xe3\x81\xaf\xe9\x9b\xa8\xe3\x81\x8c\xe9\x99\x8d\xe3\x81\xa3\xe3\x81\xa6",        "description": "\xe4\xbb\x8a\xe6\x97\xa5\xe3\x81\xaf\xe9\x9b\xa8\xe3\x81\x8c\xe9\x99\x8d\xe3\x81\xa3\xe3\x81\xa6"    }}'
>>> t
'{"main": {"description": "\\u4eca\\u65e5\\u306f\\u96e8\\u304c\\u964d\\u3063\\u3066", "title": "\\u4eca\\u65e5\\u306f\\u96e8\\u304c\\u964d\\u3063\\u3066"}}'
>>> s == json.loads(t)
True
DhruvPathak
  • 42,059
  • 16
  • 116
  • 175