0

I am writing python script which can automate my work of yaml. I will create a yaml strucutre from different csv files. But at the moment I am trying to understand yaml strucutre via examples. I was looking at some yaml tutorials and examples and i came across one problem to address properly

my python code is as follow for the above structure

import sys
import yaml
from collections import OrderedDict

d = {'version': '22-07-2017', 'description': 'energie balance',
     'info': {
         'principalInvestigator': 'Kalthoff',
         'personInCharge': 'Scheer'
     },
     'dataSources': 'null',
     'devices': {
       'type': 'HMP',
       'info': {
           'description': 'temperature and humidity sensor',
           'company': 'Vaisala',
           'model': 'HMP35A',
           },
       'measures': {
           'quantity': 'T',
           'annotations': 'air',
           'sensors': {
               'number': '001',
               'sources': {
                   'id': 'null',
                   'frequency': '0.1',
                   'aggregation': 'AVG',
                   'field': 'null'
                   }
               }

           }
       }
     }
with open('/home/ali/Desktop/yaml-conf-task/result.yml', 'w') as yaml_file:
yaml.dump(d, yaml_file,  default_flow_style=False)

But when I open yaml file it give me un ordered data . i receive this

dataSources: 'null'
description: energie balance
devices:
  info:
    company: Vaisala
    description: temperature and humidity sensor
    model: HMP35A
  measures:
    annotations: air
    quantity: T
    sensors:
      number: '001'
      sources:
        aggregation: AVG
        field: 'null'
        frequency: '0.1'
        id: 'null'
  type: HMP
info:
  personInCharge: Scheer
  principalInvestigator: Kalthoff
version: 22-07-2017

instead of getting this

version: 21-07-2017
description: energie balance
info:
  principalInvestigator: rob
  personInCharge: rio
dataSources: null
devices:
  - type: TMP
    info:
      description: temperature and humidity sensor
      company: Vio
      model: 35A
    measures:
      - quantity: T
        annotation: air
        sensors:
          - number: 001
            sources:
              - id: null
                frequency: 1
                aggregation: AVG
                field: null

If someone suggest me how can i maintain the order, i would be grateful. I look over stack overflow, but couldn't solve my problem.

zwer
  • 24,943
  • 3
  • 48
  • 66
robbin
  • 313
  • 1
  • 5
  • 14
  • Python `dict` is an unoredered structure (prior to 3.6, but one should still not rely on it). If you need to preserve the order use `collections.OrderedDict` instead (you're importing it, but you're not using it to define your dictionary). – zwer Jul 22 '17 at 10:51
  • @ yes i am importing, but i don't know how to use it properly and where. if you can guide me a bit. i would be thankful – robbin Jul 22 '17 at 10:56
  • @Anthon well i am new to work with yaml format. I came across that question but it didn't solve my problem and it seems a bit different to get an idea if someone is new like me. – robbin Jul 22 '17 at 12:41

1 Answers1

3

First of all, YAML is, technically, a superset of JSON and therefore, by specification, the order is not guaranteed for mapped sets. Hence, what you're trying to achieve is not something that you'll be able to reproduce everywhere and unless you control the full data flow you can expect issues.

Also, as I've said in my comment, Python's own dict is, generally, not order-preserving, but Python has collections.OrderedDict and you can re-declare your structure to preserve the order as:

from collections import OrderedDict

d = OrderedDict([('version', '22-07-2017'), ('description', 'energie balance'),
                 ('info', OrderedDict([
                     ('principalInvestigator', 'Kalthoff'),
                     ('personInCharge', 'Scheer')
                 ])),
                 ('dataSources', 'null'),
                 ('devices', OrderedDict([
                     ('type', 'HMP'),
                     ('info', OrderedDict([
                         ('description', 'temperature and humidity sensor'),
                         ('company', 'Vaisala'),
                         ('model', 'HMP35A')
                     ])),
                     ('measures', OrderedDict([
                         ('quantity', 'T'),
                         ('annotations', 'air'),
                         ('sensors', OrderedDict([
                             ('number', '001'),
                             ('sources', OrderedDict([
                                 ('id', 'null'),
                                 ('frequency', '0.1'),
                                 ('aggregation', 'AVG'),
                                 ('field', 'null')
                             ]))
                         ]))
                     ]))
                 ]))
                 ])

Yeah, it's a bit nastier than a clean dict structure as you have to use nested lists/tuples to preserve the order, but once you get used to it it's not all that difficult - you just need to replace all your dict declarations with OrderedDict([]) and all key: value declarations with (key, value).

But that's only one part of the equation - once you have a dict-like structure that keeps its order, your YAML serializer should also be aware of it. If you were to just dump the above structure through a generic YAML serializer (assuming PyYAML) you'll get:

!!python/object/apply:collections.OrderedDict
- - [version, 22-07-2017]
  - [description, energie balance]
  - - info
    - !!python/object/apply:collections.OrderedDict
      - - [principalInvestigator, Kalthoff]
        - [personInCharge, Scheer]
  - [dataSources, 'null']
  - - devices
    - !!python/object/apply:collections.OrderedDict
      - - [type, HMP]
        - - info
          - !!python/object/apply:collections.OrderedDict
            - - [description, temperature and humidity sensor]
              - [company, Vaisala]
              - [model, HMP35A]
        - - measures
          - !!python/object/apply:collections.OrderedDict
            - - [quantity, T]
              - [annotations, air]
              - - sensors
                - !!python/object/apply:collections.OrderedDict
                  - - [number, '001']
                    - - sources
                      - !!python/object/apply:collections.OrderedDict
                        - - [id, 'null']
                          - [frequency, '0.1']
                          - [aggregation, AVG]
                          - [field, 'null']

Sure, it keeps the order but it exports the actual internal collections.OrderedDict structure allowing you to load it back into the same structure, and that's not what you want. Instead, you need to tell it to treat your OrderedDict as a regular mapped set, so:

import yaml

def ordered_dict_representer(self, value):  # can be a lambda if that's what you prefer
    return self.represent_mapping('tag:yaml.org,2002:map', value.items())
yaml.add_representer(OrderedDict, ordered_dict_representer)

And now if you export it as:

with open('/home/ali/Desktop/yaml-conf-task/result.yml', 'w') as yaml_file:
    yaml.dump(d, yaml_file,  default_flow_style=False)

You'll get:

version: 22-07-2017
description: energie balance
info:
  principalInvestigator: Kalthoff
  personInCharge: Scheer
dataSources: 'null'
devices:
  type: HMP
  info:
    description: temperature and humidity sensor
    company: Vaisala
    model: HMP35A
  measures:
    quantity: T
    annotations: air
    sensors:
      number: '001'
      sources:
        id: 'null'
        frequency: '0.1'
        aggregation: AVG
        field: 'null'
zwer
  • 24,943
  • 3
  • 48
  • 66