2

I have a big yaml file:

---
foo: bar
baz:
  bacon: true
  eggs: false
---
goo: car
star:
  cheese: true
  water: false
---
dog: boxer
food:
  turkey: true
  moo: cow
---
...

What i'd like to do is split this file into n-number of valid yaml files.

I attempted doing this with csplit in bash:

But ultimately end up with either a lot more files than I want: csplit --elide-empty-files -f rendered- example.yaml "/---/" "{*}"

or a split where the last file contains most of the content: csplit --elide-empty-files -n 3 -f rendered- app.yaml "/---/" "{3}"

This is non-ideal. What I really want is the ability to say, split a yaml file in thirds where it splits on the closest delimiter. I know that won't always be truly thirds.

Any ideas on how to accomplish this in bash?

mootpt
  • 298
  • 3
  • 11
  • I am not yml expert. So, not sure what valid yml means. For the above input, can you show the outputs? `csplit --elide-empty-files -f rendered- example.yaml "/---/" "{*}"` seems to produce valid files. – anishsane Sep 23 '19 at 04:42
  • @anishsane it does yes, but what i want is a file say split into 3 files, where it attempt to evenly distribute the valid yaml across those 3 files. Rather than split on `---` and have the third file contain all the remaining yaml – mootpt Sep 23 '19 at 17:46
  • You can `grep -c '^---$'`, divide that by 3 and then use that number for `{repetition}`. e.g., if the file contains 50 entries, use `csplit --elide-empty-files -n 3 -f rendered- app.yaml "/---/" "{16}"` – anishsane Sep 24 '19 at 03:19

2 Answers2

2

I don't think there's a way to do this with csplit. I was able to split it into files of 1000 yaml documents each with awk:

awk '/---/{f="rendered-"int(++i/1000);}{print > f;}' app.yaml

To get exactly three files, you could try something like:

awk '/---/{f="rendered-"(++i%3);}{print > f;}' app.yaml
Neil
  • 3,899
  • 1
  • 29
  • 25
0

My idea is not a one-liner, but this works.

#!/bin/bash
file=example.yaml
output=output_
count=$(cat ${file} | wc -l)
count=$((count + 1))
lines=$(grep -n -e '---' ${file} | awk -F: '{ print $1 }')
lines="${lines} ${count}"
start=$(echo ${lines} | awk '{ print $1 }')
lines=$(echo ${lines} | sed 's/^[0-9]*//')

for n in ${lines}
do
    end=$((n - 1))
    sed -n "${start},${end}p" ${file} > "${output}${start}-${end}.yaml"         
    start=$n
done
Yuji
  • 525
  • 2
  • 8