0

I want to load duplicate keys in Ruamel Yaml. I have a requirement though. I have to use CLOADER given the huge size of my input load file. Hence I need a Loading Hack to load duplicate keys and their values with Cloader. Example:

a.yaml:

a : {a1:1,a2:2,a3:3}
a : {a1:4,a2:5,a3:6}
b : {b1:1,b2:2}

a.py:

document = open("a.yaml", "r")
yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
data = yaml.load(document)
ruamel.yaml.round_trip_dump(data, sys.stdout)

The output is:

a: {a1: 4, a2: 5, a3: 6}
b: {b1: 1, b2: 2}

The first 'a' row is lost. I tried this in PyYAML with the help from Getting duplicate keys in YAML using Python but issue is: The loading is too slow for a file with 40-50 thousand lines (approx 55 sec). So I read through some forums and got to know "CLOADER" should be used. (I am trying to get CLOADER for pyyaml).

But I want this piece of code to work in ruamel_yaml (given flexibility which I might need later) satisfying two main purposes:

  1. LOAD duplicate keys and their values too. Do not ignore them

  2. TIME: Load should not be slow

How can I solve this?

  • how do you want the duplicates to be resolved? A key could hold two values if they are in an array, or you can decide that one of the duplicate keys will be renamed. – Micks Ketches Jul 11 '18 at 16:01
  • You can call that `a.yaml` but of course it is **not** a YAML document. As the [YAML 1.2 spec indicates](http://yaml.org/spec/1.2/spec.html#id2762313), in a YAML document: "mapping - an unordered association of unique keys to values". Your keys are not unique, and therefore this is not a YAML document. You can of course tweek the parser to accept this and generate some multidict, but you should dump that a valid YAML at the first opportunity you get (as a special type that you can reload). – Anthon Jul 11 '18 at 17:47
  • BTW the Cloader does not help if the size of your **code** is big, it only helps when the **data** (i.e. the YAML document) is big. – Anthon Jul 11 '18 at 17:55
  • I do not want the duplicate keys to be renamed. I want to access all the keys with their values. I have got it working in pyyaml[as per Link above]. So wanted some similar fix in ruamel yaml. ISSUE is the fix to load duplicate keys too and access them should be in "Cloader" as the default python parser is too slow. – Preetika Tandon Jul 12 '18 at 04:01
  • I meant large input file itself. (Corrected my description) @Anthon I have a constraint where the file has duplicate keys. Just looking for some tweek possible – Preetika Tandon Jul 12 '18 at 05:34
  • @PreetikaTandon As long as you don't answer Micks' question you cannot be helped. Just writing "I want to access all the keys with their values" doesn't say anything about "how". Give a complete code example, with a small example non-YAML input. instead of referring to another question and letting us guess. – Anthon Jul 12 '18 at 05:49
  • I deleted your verbatim copy of this (very poorly worded) question that you posted as issue on bitbucket. Please don't do that again. Before I will consider looking at this again I also need two know what generates this crappy almost YAML and have an example big enough to be **representative** for all of your input. – Anthon Jul 12 '18 at 05:57
  • Btw I got it working. I used the CLoader from Ruamel.yaml and the custom loader of stackoverflow.com/questions/44904290/… . Reduced time of loading from 100 sec to 10 sec – Preetika Tandon Jul 12 '18 at 08:53

0 Answers0