0

example of files syntax of c,cpp and .h cpp file

{< //--------------------------------------------------------------------------------------------
// FORWARD DECLARATION
//--------------------------------------------------------------------------------------------


Result_t
dumpAdSidToLocalGroupsAndPriv(uint32_t                             vserverId,
                              const Asid&                          userAsid,
                              AdSidToLocalGroupsAndPrivCacheEntry& valEntry,
                              const struct timeval&                entryTime)
{
-----------------------
}

//---------------------------------------------------------------------------/>}

example of .cc file function defined

{
smdb_error ##return type
hosts_byname_iterator::apply_imp(const apply_info_t &info)
{
--------
}

like wise for other c and .h

What I want to get is using Perl regex i want to get these function name only as a output. I'm passing these file as input to that Perl file. I want to pass multiple files as input to that Perl file.

What code i'm using is this:

{
use strict;
use warnings;

my $filename = $ARGV[0];  
my $filename1 ='report.txt';
open(my $fh1, '>>', $filename1) or die "Could not open file ".$filename;
print $fh1 "\n Output file \n";
my $data = do {
open my $fh, '<', $filename or die $!;
local $/;
<$fh>;
};

my $count = 0;
while ($data =~ /(.*::.*/g ) {
    my $word = $1;
    print $fh1 $word."\n";
    ++$count;
    print "%2d: %s\n", $count, $word;
}
}
TobyLL
  • 2,098
  • 2
  • 17
  • 23

1 Answers1

0

What you are trying to do is dangerous.

Regular expressions aren't powerful enough to parse a language as complex as C++. You can find a great discussion here (though it's about HTML in that case, but what is said there still applies). Proper C++ parsing requires a full-fledged parser. According to some comments that I have read here and there while I was researching this topic myself, C++ is actually so difficult that most commercial parsers can't do it right, as there are simply too many edge cases. As suggested in the answer I have linked, however, trying a regexp based approach is, under some circumstances, possible. But you have to make sure your data follow some patterns, and normally it is difficult to make such an assumption.

That said... Your code doesn't even compile. You have to fix your regexp like this:

while ($data =~ /(.*::.*)/g ) {

But this means you will only find functions that are members of a class, and you will also get some false positives because the class::function syntax can also be used to call functions, not only to define them, so I'd look for the semicolon at the end of their declaration in the .h file. And namespaces also use the same :: notation. When I was trying to write my own regexp to parse C++ (before discovering that it can't be done, as explained above) I was trying to find something like this:

#!/usr/bin/perl
use strict;
use warnings;

my $data = "int& myClass::Function1();\n"
         . "void * me::function2(const int& temp, double a, char[] b);\n"
         . "double** class::function_3 (int[] array, int& result);\n";

while ($data =~ /\s*(\w+([\s&\*]*))((::)?((\w+)::)?(\w+)\s*\(([^)]*)\)\s*;)/gs ) {
    my $return_type = $1;
    my $class = $6;
    my $function_name = $7;
    my $arguments = $8;
    print "return_type   = $return_type\n";
    print "class         = $class\n";
    print "function_name = $function_name\n";
    print "arguments     = $arguments\n";
}

As you can see, this regexp is already pretty complex, and still there are a lot of cases that it can't catch (what about namespaces, templates, multi-line functions with possibly an argument + comment per line? And so on...). If you really want to go this way, try a test-based approach:

  1. Analyse the format of your data, that is, the function names that you want to consider (for example: do they use namespaces? Do they return references, pointers and so on? In that case, do they have spaces in between?)
  2. Create a test suite, that is, a list of functions called function1, function2, function3... making sure that you have one case for each possible syntax (this is the hard part, because how can you be sure that you have considered them all?)
  3. Write a regexp that covers as many cases as possible. If you can't cover all of them with one, consider using more than one (in the example I gave it would be more than one while loop). Every time you have a match, print it. At the end, check that you have found all the functions in your test.

If you can do all of this, and if you have done a very good job at defining the test cases, you can succeed. But let me repeat that regular expressions aren't the right tool for this, and they work only in a limited set of cases, and even determining whether they do in yours is hard.

Again: consider a parser!

Community
  • 1
  • 1