unix parse values from key-value pairs and print nested results

Question

I managed to parse a custom yaml using below script from How can I parse a YAML file from a Linux shell script? by Stefan:

function parse_yaml {
   local prefix=$2
   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p"  $1 |
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]}}
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
      }
   }'
}

Output:

$ parse_yaml new_export.yaml
schemas_name="exports"
schemas_tables_name="TEST1"
schemas_tables_description="'"Tracks analysis"
schemas_tables_active_date="2019-01-07 00:00:00"
schemas_tables_columns_name="event_create_ts"
schemas_tables_columns_type="timestamp without time zone"
schemas_tables_columns_name="issue_id"
schemas_tables_columns_type="bigint"
schemas_tables_columns_description="conv id"
schemas_tables_columns_example="21352352"
schemas_tables_columns_name="company_id"
schemas_tables_columns_type="bigint"
schemas_tables_columns_description="'"Tracks analysis"
schemas_tables_columns_example="10001"
schemas_tables_name="TEST2"
schemas_tables_description="This table presents funny encounters"
schemas_tables_active_date="2018-12-18 00:00:00"
schemas_tables_columns_name="instance_ts"
schemas_tables_columns_type="datetime"
schemas_tables_columns_description="|-"
schemas_tables_columns_example="2018-03-03 12:30:00"
schemas_tables_columns_name="address_id"
schemas_tables_columns_type="bigint"

How can I generate a csv file out of it using nested hierarchy for each table and its colum etc based on the Keys ?

Something like below:

exports.TEST1.event_create_ts,"timestamp without time zone"
exports.TEST1.issue_id,bigint,"conv id",21352352
exports.TEST1.company_id,bigint,"'"Tracks analysis",10001
exports.TEST2.instance_ts,datetime,"|-","2018-03-03 12:30:00"
exports.TEST2.address_id,bigint

Any help would be appreciated!

I strongly advise against bash (or awk) for this -- the code above isn't even *close to* being a general-purpose YAML parser; that it works with your current inputs doesn't mean it'll work with YAML written by a different library or tool. Much better to use a language with a real, spec-compliant parser. — Charles Duffy, Mar 12 '19 at 16:41
I looked up yq/jshon and other binaries, but I cannot install those. need a native solution. The yaml I have in hand will be strictly of fixed format,only addition would be any new elements. so, i am not exactly looking for an universal yaml parsing solution in this case. Any way to establish the relationship and output results as above? — StrangerThinks, Mar 12 '19 at 16:54
Your YAML subset will be strictly fixed until someone forgets that they have to use that subset and creates an otherwise valid YAML file, and then your MiniYAML parser breaks. Save everybody a lot of time and hassle down the road and insist on using proper YAML tools *now*. — chepner, Mar 12 '19 at 17:23
Unrelated, if you are committed enough to `bash` to use the `function` keyword, then `fs=$'\034'` is a much simpler way to define your field separator. — chepner, Mar 12 '19 at 17:29
Thanks for the input. I tried another approach to handle the yaml but didnt get much response to it. Can you check this as well? https://stackoverflow.com/questions/55105372/parse-bullet-numbered-list-to-schema-oriented-csv-in-unix — StrangerThinks, Mar 12 '19 at 17:50

unix parse values from key-value pairs and print nested results

0 Answers0