16

I search a text based data format which supports multiline strings.

JSON does not allow multiline strings:

>>> import json
>>> json.dumps(dict(text='first line\nsecond line'))
'{"text": "first line\\nsecond line"}'

My desired output:

{"text": "first line
second line"}

This question is about input and output. The data format should be editable with a editor like vi, emacs or notepad.

I don't care if simple quotes " or tripple quotes (like in Python) """ get used.

Is there a easy for human beings readable textual data interchange format which supports this?

Use case

I want to edit data with multiline strings with vi. This is not fun, if the data is in json format.

guettli
  • 25,042
  • 81
  • 346
  • 663
  • can you elaborate the data format/purpose i.e. complex structures or some settings/conf file, etc. – Nabeel Ahmed Aug 22 '16 at 07:17
  • @NabeelAhmed I want to use it for configuration. A lot of applications invent their own configuration language. I want to avoid this. But json and ConfigParser don't satisfy me. Json does not allow strings with newlines (only \n) and ConfigParser does not allow nested data structures. Next thing that I am missing: Validation (But this is a different topic). Dear Nabeel, please leave a new comment if there is something missing. – guettli Aug 22 '16 at 09:47
  • I think if you can replace dump result, then the result should be right. `data = json.dumps(dict(text='first line\nsecond line')) data = data.replace('\\n', '\n') print(data)` – Em L Aug 25 '16 at 08:14

7 Answers7

21

I think you should consider YAML format. It supports block notation which is able to preserve newlines like this

data: |
   There once was a short man from Ealing
   Who got on a bus to Darjeeling
       It said on the door
       "Please don't spit on the floor"
   So he carefully spat on the ceiling

Also there is a lot of parsers for any kind of programming languages including python (i.e pyYaml).

Also there is a huge advantage that any valid JSON is YAML.

vsminkov
  • 10,912
  • 2
  • 38
  • 50
  • But the indentation ... It means you cannot paste the data straight into a value without indenting each line. – ingyhere Sep 11 '21 at 01:34
4

Apropos of your comment:

I want to use it for configuration. A lot of applications invent their own configuration language. I want to avoid this. But json and ConfigParser don't satisfy me. Json does not allow strings with newlines (only \n) and ConfigParser does not allow nested data structures. Next thing that I am missing: Validation (But this is a different topic).

There're 3 main options you have ConfigParser, ConfigObj, or YAML (PyYAML) - each with their particular pros and cons. All 3 are better then JSON for your use-case i.e. configuration file.

Now further, which one is better depends upon what exactly you want to store in your conf file.


ConfigObj - For configuration and validation (your use-case):

ConfigObj is very simple to use then YAML (also the ConfigParser). Supports default values and types, and also includes validation (a huge plus over ConfigParser).

An Introduction to ConfigObj

When you perform validation, each of the members in your specification are checked and they undergo a process that converts the values into the specified type. Missing values that have defaults will be filled in, and validation returns either True to indicate success or a dictionary with members that failed validation. The individual checks and conversions are performed by functions, and adding your own check function is very easy.

P.S. Yes, it allows multiline values.


Helpful links:

A Brief ConfigObj Tutorial

ConfigObj 5 Introduction and Reference


There are solid SO answers available on the comparison YAML vs ConfigParser vs ConfigObj:

What's better, ConfigObj or ConfigParser?

ConfigObj/ConfigParser vs. using YAML for Python settings file


Community
  • 1
  • 1
Nabeel Ahmed
  • 18,328
  • 4
  • 58
  • 63
3

If the files are only used by Python (overlooking the interchange), you could simply put your data in a python script file and import this as a module:

Data

datum_1 = """ lorem
ipsum
dolor
"""
datum_list = [1, """two
liner"""]
datum_dict = {"key": None, "another": [None, 42.13]}
datum_tuple = ("anything", "goes")

Script

from data import *
d = [e for e in locals() if not e.startswith("__")]
print( d )
for k in d:
  print( k, locals()[k] )

Output

['datum_list', 'datum_1', 'datum_dict', 'datum_tuple']
datum_list [1, 'two\nliner']
datum_1  lorem
ipsum
dolor

datum_dict {'another': [None, 42.13], 'key': None}
datum_tuple ('anything', 'goes')


Update:

Code with dictionary comprehension

from data import *
d = {e:globals()[e] for e in globals() if not e.startswith("__")}
for k in d:
  print( k, d[k] )
handle
  • 5,859
  • 3
  • 54
  • 82
3

XML with ElementTree (standard library) or lxml if you are OK with the markup overhead:

Data

<?xml version="1.0"?>
<data>
  <string>Lorem
Ipsum
Dolor
  </string>
</data>

Script

import xml.etree.ElementTree
root = xml.etree.ElementTree.parse('data.xml').getroot()
for child in root:
  print(child.tag, child.attrib, child.text)

Output

string {} Lorem
Ipsum
Dolor
handle
  • 5,859
  • 3
  • 54
  • 82
  • In XML the data can be pasted directly into the document without altering it to add things such as indents. I think CDATA is also an option to represent more complex values. – ingyhere Sep 11 '21 at 01:36
2

ini format also supports multiline strings; configparser from Python stdlib can handle it. See https://docs.python.org/3/library/configparser.html#supported-ini-file-structure.

Mikhail Korobov
  • 21,908
  • 8
  • 73
  • 65
1

If you're using Python 2, I actually think json can do what you need. You can dump and load json while decoding and encoding it with string-escape:

import json

config_dict = {
    'text': 'first line\nsecond line',
}

config_str = json.dumps(config_dict).decode('string-escape')
print config_str

config_dict = json.loads(config_str.encode('string-escape'))
print config_dict

Output:

{"text": "first line
second line"}

{u'text': u'first line\nsecond line'}

So, you can use the decoded string to edit your JSON, newlines included, and when reading it, just encode with string-escape to get the dictionary back.

Karin
  • 8,404
  • 25
  • 34
0

Not sure whether I've understood your question correctly, but are you not asking for something like this?

my_config = {
    "text": """first line
second line"""
}

print my_config
BPL
  • 9,632
  • 9
  • 59
  • 117
  • What kind of data format is this? You show Python source. This was already the answer of user "handle". – guettli Aug 25 '16 at 07:18
  • @guettli Oh, that's right, my point was exactly the same than "handle" user. – BPL Aug 25 '16 at 11:25