2

I have a YAML file called input.yaml:

---
'001':
  name: Ben
  email: ben@test.com
'002':
  name: Lisa
  email: lisa@test.com
'003':
  name: Alex
  email: alex@test.com
.
.
.

I have a dictionary:

my_dict = {'001': '000-111-2222', '002': '000-111-2223', '003': '000-111-2224', ...}

I would like to have an updated file called output.yaml that looks like this:

---
'001':
  name: Ben
  email: ben@test.com
  phone: 000-111-2222
'002':
  name: Lisa
  email: lisa@test.com
  phone: 000-111-2223
'003':
  name: Alex
  email: alex@test.com
  phone: 000-111-2224
.
.
.

Note how the output file has the "phone" field added with the value coming from the dictionary value of the matching key.

How do I get such file? ... I have tried all sorts.

Anthon
  • 69,918
  • 32
  • 186
  • 246
David
  • 1,469
  • 5
  • 33
  • 51

2 Answers2

3

If you are concerned that the format of the file stays the same (and if there are comments that they should be preserved), you can do:

import ruamel.yaml

yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.explicit_start = True

with open('input.yaml') as fp:
    data = yaml.load(fp)

my_dict = {
    '001': '000-111-2222',
    '002': '000-111-2223',
    '003': '000-111-2224',
}

for k in my_dict:
    data.setdefault(k, {})['phone'] = my_dict[k]

with open('output.yaml', 'w') as fp:
    yaml.dump(data, fp)

After which output.yaml contains:

---
'001':
  name: Ben
  email: ben@test.com
  phone: 000-111-2222
'002':
  name: Lisa
  email: lisa@test.com
  phone: 000-111-2223
'003':
  name: Alex
  email: alex@test.com
  phone: 000-111-2224

Notes:

  1. The yaml.preserve_quotes = True is not really necessary, as for scalars that need quotes ( your strings starting with zero ) single quotes are the default, and there are no superfluous quotes in your input either.

  2. I use data.setdefault(k, {})['phone'] instead of checking if data[k] exists as @Aaron suggested in his source. It will create an (empty) dict if the key k is not in data.

  3. If you only want to update matching keys, then use the following in the for loop:

    try:
        data[k]['phone'] = my_dict[k]
    except KeyError:
        pass
    
  4. You need yaml.explicit_start = True to get the --- at the document start. ruamel.yaml doesn't automatically preserve that. If you need the document end marker (...) as well use: yaml.explicit_end = True

  5. If you want the phone number to appear between name and email, then use:

    data.setdefault(k, {}).insert(1, 'phone', my_dict[k])
    

    which gives:

    ---
    '001':
      name: Ben
      phone: 000-111-2222
      email: ben@test.com
    '002':
      name: Lisa
      phone: 000-111-2223
      email: lisa@test.com
    '003':
      name: Alex
      phone: 000-111-2224
      email: alex@test.com
    

    (i.e. 0 means insert before the first key, 1 before the second key, etc.)

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Just a suggestion. This code won't work if a user has more than 2 phone numbers. Perhaps also take that into consideration in your code :) – d_- Jul 16 '19 at 15:44
  • @Dirk Why would that not work? If a value in `my_dict` is a list of (phone number) strings instead of a single string, the code still works AFAICT – Anthon Jul 17 '19 at 06:02
  • Try running my_dict = { '001': '000-111-2222', '001': '000-314-2222', '002': '000-111-2223', '003': '000-111-2224', } – d_- Jul 17 '19 at 08:17
  • In the Python versions I use, any duplicate keys, in dicts, are overwritten (which one depends on the key and the Python version). Have you tried `print({ '001': '000-111-2222', '001': '000-314-2222', '002': '000-111-2223', '003': '000-111-2224', })` in your Python? If it does show two keys `001`, please provide a link to the Python version and specify which platform you are using. If it doesn't show two keys `001` please explain how you expect the other value to show up in the YAML output, if it is not in the data handed to the `dump` method. – Anthon Jul 17 '19 at 08:24
  • It is the latter case. Firstly, thanks for your answer. It was very helpful (and got my upvote). Secondly, I was looking at processing a dictionary which has multiple duplicate keys (as shown in my previous comment) and was hoping it would work, but it did not :( Your code only takes into consideration the last uniqie key as it loops over the dict. Is it possible to update it, that it also takes into consideration duplicate keys (if they existed). Thanks in advance! – d_- Jul 17 '19 at 09:12
  • 1
    @Dirk YAML doesn't support duplicate keys in mappings (the normal representation for Python dicts), that is why I indicated at least to use sequences/lists. It would IMO be better to use some tagged value for the key `001` to handle multiple phone numbers, but that is more than comments here can handle: in that case post a new question with clear description of where you start from and what end-result you want. – Anthon Jul 17 '19 at 09:32
  • Ok. Makes sense :) I will re-evaluate my approach, maybe yaml is the wrong solution. Maybe I do it in a relational database rather where one can do relational mappings easier. Thanks for the follow-up! – d_- Jul 18 '19 at 15:32
1

Aside from reading & writing to file, maybe this will point you in the right direction:

import yaml


document = """
---
'001':
  name: Ben
  email: ben@test.com
'002':
  name: Lisa
  email: lisa@test.com
'003':
  name: Alex
  email: alex@test.com
"""

phones = {'001': '000-111-2222', '002': '000-111-2223', '003': '000-111-2224'}

doc = yaml.safe_load(document)

for k, v in phones.items():
    # Might want to check that 'doc[k]' exists
    doc[k]['phone'] = v

print(yaml.safe_dump(doc, default_flow_style=False, explicit_start=True))

Output:

'001':
  email: ben@test.com
  name: Ben
  phone: 000-111-2222
'002':
  email: lisa@test.com
  name: Lisa
  phone: 000-111-2223
'003':
  email: alex@test.com
  name: Alex
  phone: 000-111-2224
Aaron N. Brock
  • 4,276
  • 2
  • 25
  • 43
  • This seems to be working in terms of adding the phone field, BUT, it has changed the format of the file ... How can I preserve the format and the state? and JUST add the phone field after the email field? – David Apr 17 '18 at 22:18
  • @Becks you can use an `OrderedDict` to achieve this, see [this answer](https://stackoverflow.com/a/16782282/7607701)! – Aaron N. Brock Apr 17 '18 at 22:58
  • There is no need to use the unsafe `load()`, you should always use `safe_load()`. PyYAML is stream based, with a fall-back to buffer and provide a string, if you don't provide a stream. Therefore your `print(yaml.dump(doc))` is slow and inefficient, and should be written `yaml.safe_dump(doc, sys.stdout)` (and it can cause you to run out of memory). Your output doesn't include the document start marker the OP requested, you can use the parameter `explicit_start=True` for that. – Anthon Apr 18 '18 at 06:56
  • @Anthon, thanks I applied the edits. However, I left it as `print` since it's really a debug statement more than the actual result, since it "should" be written to a file. – Aaron N. Brock Apr 18 '18 at 20:29
  • @AaronN.Brock I can understand your sentiment about `print()` it is more clearly a debug statement then `yaml.safe_dump(doc, sys.stdout)`. Maybe you'll have a different opinion once you've seen code dump large YAML files (hundreds of Mb) using `print(yaml.safe_dump(doc), file=some_file_pointer)`, as I have in reviews ;-) – Anthon Apr 19 '18 at 04:46