This should catch your functions, with caveats
perl -0777 -wnE'@f = /(func1\s*\( [^;]* \))\s*;/xg; s/\s+/ /g, say for @f' tt.c
I use the fact that a statement must be terminated by ;
. Then this excludes an accidental ;
in a comment and it excludes calls to this being nested inside another call. If that is possible then quite a bit more need be done to parse it.
However, further parsing the captured calls, presumably by commas, is complicated by the fact that a nested call may well, and realistically, contain commas. How about
func1( a, b, f2(a2, b2), c, f3(a3, b3), d );
This becomes a far more interesting little parsing problem. Or, how about macros?
Can you clarify what kinds of things one doesn't have to account for?
As the mentioned caveats may be possible to ignore here is a way to parse the argument list, using
Text::Balanced.
Since we need to extract whole function calls if they appear as an argument, like f(a, b)
, the most suitable function from the library is extract_tagged
. With it we can make the opening tag be a word-left-parenthesis (\w+\(
) and the closing one a right-parenthesis \)
.
This function extracts only the first occurrence so it is wrapped in extract_multiple
use warnings;
use strict;
use feature 'say';
use Text::Balanced qw(extract_multiple extract_tagged);
use Path::Tiny; # path(). for slurp
my $file = shift // die "Usage: $0 file-to-parse\n";
my @functions = path($file)->slurp =~ /( func1\( [^;]* \) );/xg;
s/\s+/ /g for @functions;
for my $func (@functions) {
my ($args) = $func =~ /func1\s*\(\s* (.*) \s*\)/x;
say $args;
my @parts = extract_multiple( $args, [ sub {
extract_tagged($args, '\\w+\\(', '\\\)', '.*?(?=\w+\()')
} ] );
my @arguments = grep { /\S/ } map { /\(/ ? $_ : split /\s*,\s*/ } @parts;
s/^\s*|\s*\z//g for @arguments;
say "\t$_" for @arguments;
}
The extract_multiple
returns parts with the (nested) function calls alone (identifiable by having parens), which are arguments as they stand and what we sought with all this, and parts which are strings with comma-separated groups of other arguments, that are split into individual arguments.
Note the amount of escaping in extract_tagged
(found by trial and error)! This is needed because those strings are twice double-quoted in a string-eval
. That isn't documented at all, so see the source (eg here).
Or directly produce escape-hungry characters (\x5C
for \
), which then need no escaping
extract_tagged($_[0], "\x5C".'w+'."\x5C(", '\x5C)', '.*?(?=\w+\()')
I don't know which I'd call "clearer"
I tested on the file provided in the question, to which I added a function
func1( a, b, f2(a2, f3(a3, b3), b2), c, f4(a4, b4), d, e );
For each function the program prints the string with the argument list to parse and the parsed arguments, and the most interesting part of the output is for the above (added) function
[ ... ]
a, b, f2(a2, f3(a3, b3), b2), c, f4(a4, b4), d, e
a
b
f2(a2, f3(a3, b3), b2)
c
f4(a4, b4)
d
e