1

Let's say I am given a yaml file called "label.yaml"

id: 1
color: red
toy: car

I want to make 10000 copies of this file with id being the ONLY value that changes and all it has to do is change incrementally.

id: 1
color: red
toy: car
id: 2
color: red
toy: car
id: 3
color: red
toy: car

... And so on...

Something I've tried:

import yaml

with open("data.yaml") as f:
    data = yaml.safe_load(f)


for i in range(1,100001):
    data["id"] = i

    with open(f"data-{i}.yaml", "w+") as f:
        yaml.dump(data, f)

Is there a more efficient way to do this?

FelixC
  • 53
  • 4

1 Answers1

1

A simple trick I might suggest, is instead of calling yaml.dump or yaml.load, you can just have a string with the YAML contents, do string replacement, and write the string contents to file directly, without the yaml library.

The example below uses str.format to format a template string with the local variable i in each loop iteration.

file_template = """
id: {}
color: red
toy: car
""".strip()

for i in range(1,100001):
    file_contents = file_template.format(i)

    with open(f"data-{i}.yaml", "w") as f:
        f.write(file_contents)

NB: Per this discussion, there doesn't seem to be a way to speed up disk IO using open unfortunately. I feel as if this is likely the most optimized version that is possible - note that the long execution times you may notice are mainly because of disk I/O, which is slow by nature.

rv.kvetch
  • 9,940
  • 3
  • 24
  • 53