I have some documentation that I made for an HDF5 file format, which is written in the GraphViz dot language. (This is a C-like language with lots of curly braces.) This master file contains numerous elements like this:
subgraph cluster_clustername {
...
lots of stuff including more curly braces spanning multiple lines
...
}
that I want to extract this block of text based on clustername. (I would like to create graphs of these subgraphs individually instead of a super large graph containing everything. Each subraph cluster is an individual HDF5 file which are connected through HDF5 external softlinks.)
There should be a way to extract this desired hunk of text (an exercise in matching the first { after some specific pattern of text and the closing } across multiple lines with nesting. This seems like it should be a relatively common task because of the prevalence of C and C-like languages.
In my mind the top candidate tools for accomplishing this are:
awk
python
gvpr - graph stream editor provided with graphviz (but this won't be helpfull to others, say C programmers with the same question and few examples exist on the web and the syntax is confusing)
sed
Currently I maintain the master file, then update each of the derived files in Emacs using M-x ediff-regions-linewise but I need an automated (so I can use Make to build documentation files) and robust method of generating the derived files. The only above tool which I have modest experience with is sed but because the pattern is complicated and spans multiple lines I think a tool like awk or python might be better suited to the task.
In fact I tried a technique similar to reference counting in awk but I am running into problems understanding some of the more subtle behaviors of awk and have only really used awk one liners in the past.
Thanks so much in advance for any help you have. -Z