I'm currently trying to extract 300-odd functions and subroutines from a 22kLoC file, and decided to try to do it programmatically (I did it by hand for the 'biggest' chunks).
Consider a file of the form
declare sub DoStatsTab12( byval shortlga as string)
declare sub DoStatsTab13( byval shortlga as string)
declare sub ZOMFGAnotherSub
Other lines that start with something other than "/^sub \w+/" or "/^end sub/"
sub main
This is the first sub: it should be in the output file mainFunc.txt
end sub
sub test
This is a second sub
it has more lines than the first.
It is supposed to go to testFunc.txt
end sub
Function ConvertFileName(ByVal sTheName As String) As String
This is a function so I should not see it if I am awking subs
But when I alter the awk to chunk out functions, it will go to ConvertFileNameFunc.txt
End Function
sub InitialiseVars(a, b, c)
This sub has some arguments - next step is to parse out its arguments
Code code code;
more code;
' maybe a comment, even?
and some code which is badly indented (original code was written by a guy who didn't believe in structure or documentation)
and
with an arbitrary number of newlines between bits of code because why not?
So anyhow - the output of awk should be everything from sub InitialiseVars to end sub, and should go into InitialiseVarsFunc.txt
end sub
The gist: find sets of lines that begin with
^sub [subName](subArgs)
and end with
^end sub
And then (and here's the bit that eludes me): save the extracted subroutine to a file named [subName]Func.txt
awk
suggested itself as a candidate (I have written text-extraction regex queries in PHP in the past using preg_match()
, but I don't want to count on having WAMP/LAMP availability).
My starting point is the delightfully-parsimonious (double-quotes because Windows)
awk "/^sub/,/^end sub/" fName
This finds the relevant chunks (and prints them to stdout).
The step of putting the output to a file, and naming the file after $2
of the awk
capture, is beyond me.
An earlier stage of this process involved awk
-ing the subroutine names and storing them: that was easy, since each sub is declared by a one-liner of the form
declare sub [subName](subArgs)
So this does that, and does it perfectly -
awk "match($0, /declare sub (\w+)/)
{print substr($3, RSTART, index($3, \"(\")>0 ? index($3, \"(\")-1: RLENGTH)
> substr($3, RSTART, index($3, \"(\")>0 ? index($3, \"(\")-1: RLENGTH)\".txt\"}"
fName
(I've tried to present it so that it's easy to see that the output filename and $3
of the awk
- parsed up to the first ')' if any - are the same thing).
It seems to me that if the output of
awk '/^sub/,/^end sub/' fName
was concatenated into one array, then $2 (appropriately truncated at '(' ) would work. But it didn't.
I have looked at various SO (and other SE-family) threads that deal with multiline awk
- e.g., this one and this one, but none have given me enough of a heads-up on my problem (they help with getting the match itself, but not with piping it to a file named after itself).
I have RTFD for awk
(and grep
), also to no avail.