5

I have a csv file which containing data where the header contains keys and the cells contain values. I would like to use python to create a yaml file from the contents of the csv file.

I created a dictionary of the K:V pairs; however, I am stuck trying to get the K:V pairs into the yaml file.

The structure of the yaml must be:

key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---

If I were to manually create these, I would have more than 1000 YAMLs so it's pretty time consuming and unrealistic.

I am looking for any ideas your much more experienced people might have.

I would really like the output to iterate through the dictionary to create a huge listing of YAMLs like below:

key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---
key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---
key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---
key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---

Sample Code:

import csv
import yaml

def csv_dict_list(variables_file) :

    reader=csv.DictReader(open(variables_file, 'r'))
    dict_list = []
    for line in reader:
        dict_list.append(line)
    return dict_list

yaml_values = csv_dict_list(sys.argv[1])

No matter what I try after this, I can not get the desired output using yaml.load() or yaml.load_all().

ShortCircuit
  • 53
  • 1
  • 4
  • 4
    Possible duplicate of [How can I write data in YAML format in a file?](https://stackoverflow.com/questions/12470665/how-can-i-write-data-in-yaml-format-in-a-file), Googling your title gave me this as the first match, so if there is some way in which this is insufficient, please describe why, thanks. – Ken Y-N Feb 07 '18 at 00:13
  • That's where I am stuck I guess. I have the dictionary created, but how do I read it in rather than manually specify the dictionary lines and allow for it to iterate through the dictionary? – ShortCircuit Feb 07 '18 at 00:23
  • Please post the code that you have so far, then, so we can look at that. The linked dup shows you how to dump one dictionary, and there is also the `dump_all()` method that takes a list of dictionaries. – Ken Y-N Feb 07 '18 at 00:32
  • 1
    posted above as "sample code" – ShortCircuit Feb 07 '18 at 01:48
  • So, `yaml_values` is something like `[0] = {key1: val1}, [1] = {key1: val1}` etc and you would want something like `toYAML(yaml_values[0]) '---' toYAML(yaml_values[1])` in a single file as you indicate in your sample data? – Ken Y-N Feb 07 '18 at 02:58
  • yes exactly right – ShortCircuit Feb 07 '18 at 03:02

1 Answers1

2

First of all, you should use dump() or dump_all(), since you want to write YAML, instead of using load().

You also should also be aware that the CSV reader does return something different on Python 2.7 then e.g. on Python 3.6: on the first you get a list of dict back from csv_dict_list and on the second a list of OrderedDict). That in itself would not be a problem, but PyYAML dumps a dict with the keys sorted, and an ordereddict with a tag.

Your proposed YAML is also not valid, as the flow style mapping in the line:

 key5: {key6: [value6]

is not terminated with a } before the end of the document, you also cannot have:

key9: value9
  -
  - 

either use:

key9: value9
key10:
  -
  -

or

key9: 
  - value9
  -

or something similar (there is also no equivalent Python data structure that has both a value and a list for one and the same key, so cannot actually create something like that even in Python).

PyYAML additionally lacks the support for indenting your block style sequence. If you do:

import yaml
print(yaml.dump(dict(x=[dict(a=1, b=2)]), indent=4))

the output will still be flush left:

x:
- {a: 1, b: 2}

To prevent all these problems you will run into when using PyYAML, and to circumvent the differences in Python versions, I recommend you use ruamel.yaml (disclaimer: I am the author of that package), and the following code:

import sys
import csv
import ruamel.yaml

Dict = ruamel.yaml.comments.CommentedMap

def csv_dict_list(variables_file) :
    reader=csv.reader(open(variables_file, 'r'))
    key_list = None
    dict_list = []
    for line in reader:
        if key_list is None:
            key_list = line
            continue
        d = Dict()
        for idx, v in enumerate(line):
            k = key_list[idx]
            # special handling of key3/key4/key5/key6
            if k == key_list[2]:
                d[k] = []
            elif k == key_list[3]:
                d[key_list[2]].append(Dict([(k, v)]))
            elif k == key_list[4]:
                d[key_list[2]][0][k] = dt = Dict()
                dt.fa.set_flow_style()
            elif k == key_list[5]:
                d[key_list[2]][0][key_list[4]][k] = [v]
            else:
                d[k] = v
        dict_list.append(d)
    return dict_list

data = csv_dict_list('test.csv')


yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)
yaml.dump_all(data, sys.stdout)

With test.csv:

key1,key2,key3,key4,key5,key6,key7,key8,key9
value_a1,value_a2,value_a3,value_a4,value_a5,value_a6,value_a7,value_a8,value_a9
value_b1,value_b2,value_b3,value_b4,value_b5,value_b6,value_b7,value_b8,value_b9

this gives:

key1: value_a1
key2: value_a2
key3:
  - key4: value_a4
    key5: {key6: [value_a6]}
key7: value_a7
key8: value_a8
key9: value_a9
---
key1: value_b1
key2: value_b2
key3:
  - key4: value_b4
    key5: {key6: [value_b6]}
key7: value_b7
key8: value_b8
key9: value_b9

on both Python 2.7 and Python 3.6

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Thank you! I had seen ruamel.yaml; however, hadn't tried it as a solution. In fact this worked great. One observation where I am looking for help is that the out put actually looks like: `"\uFEFFkey1": value_b1 key2: value_b2 key3: - key4: value_b4 key5: {key6: [value_b6]} key7: value_b7 ' ' key8: value_b8 key9: value_b9` – ShortCircuit Feb 09 '18 at 20:57
  • that `\uFEFF` is the [Big Endian UTF-16 BOM](https://en.wikipedia.org/wiki/Byte_order_mark). Depending on which version of Python and how you open the file, this might not be stripped on opening. – Anthon Feb 10 '18 at 07:29
  • I want to shift the output format slightly and I can't get it to work right from document pages. It would be fantastic if you could help. Looking for `key1: value_a1 /n key2: value_a2 /n key3: /n - key4: value_a4 /n key5: /n key6: /n - value_a6 /n key7: value_a7 /n key8: value_a8 /n key9: value_a9`. /n = newline, can't get it to work in new comment window correctly – ShortCircuit Feb 19 '18 at 00:54
  • I have no idea what you expect, try not to abuse comments for (follow up) questions. You should post a new follow up question with proper formatting of the output that you want. Refer to this one, the code that you have. If you tag it [tag:python], [tag:yaml], and [tag:ruamel.yaml], I'll get notified automatically – Anthon Feb 19 '18 at 06:16