4

I'm working on a small academic research about extremely long and complicated functions in the Linux kernel. I'm trying to figure out if there is a good reason to write 600 or 800 lines-long functions.

For that purpose, I would like to find a tool that can extract a function from a .c file, so I can run some automated tests on the function.

For example, If I have the function cifs_parse_mount_options() within the file connect.c, I'm seeking a solution that would roughly work like:

extract /fs/cifs/connect.c cifs_parse_mount_options

and return the 523 lines of code(!) of the function, from the opening braces to the closing braces.

Of course, any way of manipulating existing software packages like gcc to do that, would be most helpful too.

Thanks,

Udi

EDIT : The answers to Regex to pull out C function prototype declarations? convinced me that matching function declaration by regex is far from trivial.

Community
  • 1
  • 1
Adam Matan
  • 128,757
  • 147
  • 397
  • 562

6 Answers6

3

Why don't you write a small PERL/PHP/Python script or even a small C++,Java or C# program that does that?

I don't know of any already-made tools to do that but writing the code to parse out the text file and extract a function body from a C++ code file should not take more than 20 lines of code.. The only difficult part will be locating the beginning of the function and that should be a relatively simple task using RegEx. After that, all you need is to iterate through the rest of the file keeping track of opening and closing curly braces and when you reach the function body closing brace you're done.

Mike Dinescu
  • 54,171
  • 16
  • 118
  • 151
  • That was my first idea, but comments make things a little more complicated. However, If I am to write such a script. I guess I'll publish it - it might be useful to others. – Adam Matan Jul 17 '09 at 15:50
  • I agree.. you probably need to take comments into consideration just in case they might contain non-matching curly braces inside. Still, it shouldn't be too difficult to code! – Mike Dinescu Jul 17 '09 at 15:57
  • I have to strip them out, get the location of the end of the function, and then extract the code with the comments. Another issue is the function signature on top - it can be more than one line long. As I mentioned before, it's not a big deal - but I'd rather focus on my research and use a ready-made tool. If that won't be possible, I'll Python the problem! – Adam Matan Jul 17 '09 at 15:58
  • Seriously, writing this script should take less than an hour. You have probably already wasted more time looking for a tool to do it than doing it yourself. – muusbolla Jul 17 '09 at 17:49
  • 1
    You should run the preprocessor (e.g. by `gcc -E`) before you parse the source. After preprocessing it should be easy. – Frunsi May 21 '10 at 19:30
  • Preprocessing is a bad idea if you want to keep comments or code layout. If you don't preprocess, then you may have a very bad time picking up a function whose header or tail is defined/modified by preprocessor conditionals or macros. While this is rare, in big systems everything happens so expect to encounter it. The right answer is that a any kind of hacky solution that appears to work, will only appear to work; it won't work under all circumstances. If OP doesn't care about that, then fine. If he does, he'll need to modify a tool that essentially contains a preprocessor and parser. – Ira Baxter Mar 15 '15 at 07:39
1

indent -kr code -o code.out

awk -f split.awk code.out

you have to adapt a little bit split.awk wich is somewhat specific to my code and refactoring needs (for example y have so struct who are not typedefs

And I'm sure you can make a nicer script :-)

--
BEGIN   { line=0; FS="";
    out=ARGV[ARGC-1]  ".out";
    var=ARGV[ARGC-1]  ".var";
    ext=ARGV[ARGC-1]  ".ext";
    def=ARGV[ARGC-1]  ".def";
    inc=ARGV[ARGC-1]  ".inc";
    typ=ARGV[ARGC-1]  ".typ";
    system ( rm " " -f " " out " " var " " ext " " def " " inc " " typ );
    }
/^[     ]*\/\/.*/   { print "comment :" $0 "\n"; print $0 >> out ; next ;}
/^#define.*/        { print "define :" $0 ; print $0 >>def ; next;}
/^#include.*/       { print "define :" $0 ; print $0 >>inc ; next;}
/^typedef.*{$/      { print "typedef var :" $0 "\n"; decl="typedef";print $0 >> typ;infile="typ";next;}
/^extern.*$/        { print "extern :" $0 "\n"; print $0 >> ext;infile="ext";next;}
/^[^    }].*{$/     { print "init var :" $0 "\n";decl="var";print $0 >> var; infile="vars";
                print $0;
                fout=gensub("^([^    \\*])*[    ]*([a-zA-A0-9_]*)\\[.*","\\2","g") ".vars";
                     print "var decl : " $0 "in file " fout;
                     print $0 >fout;
                next;
                        }
/^[^    }].*)$/     { print "func  :" $0 "\n";decl="func"; infile="func";
                print $0;
                fout=gensub("^.*[    \\*]([a-zA-A0-9_]*)[   ]*\\(.*","\\1","g") ".func";
                     print "function : " $0 "in file " fout;
                     print $0 >fout;
                next;
            }
/^}[    ]*$/        { print "end of " decl ":" $0 "\n"; 
                if(infile=="typ") {
                    print $0 >> typ;
                }else if (infile=="ext"){
                    print $0 >> ext;
                }else if (infile=="var") {
                    print $0 >> var;
                }else if ((infile=="func")||(infile=="vars")) {
                    print $0 >> fout; 
                    fflush (fout);
                    close (fout);
                }else if (infile=="def") {
                    print $0 >> def;
                }else if (infile=="inc"){
                    print $0 >> inc;
                }else print $0 >> out;
                next;
            }
/^[a-zA-Z_]/        { print "extern :" $0 "\n"; print $0 >> var;infile="var";next;}
            { print "other :" $0 "\n" ; 
                if(infile=="typ") {
                    print $0 >> typ;
                }else if (infile=="ext"){
                    print $0 >> ext;
                }else if (infile=="var") {
                    print $0 >> var;
                }else if ((infile=="func")||(infile=="vars")){
                    print $0 >> fout;
                }else if (infile=="def") {
                    print $0 >> def;
                }else if (infile=="inc"){
                    print $0 >> inc;
                }else print $0 >> out;
               next;
               }
1

in case you are finding difficult to extract function names :

1> use ctags ( a program ) to extract function names . ctags -x --c-kinds=fp path_to_file. 2> once u got the function names, write a simple perl script to extract contents of function by passing the script name of function as said above.

sherelock
  • 464
  • 5
  • 9
0

Bash builtin declare appears to provide similar functionality, but I am not sure how it is implemented. In particular, declare -f lists the functions in the present environment:

declare -f quote
declare -f quote_readline

declare outputs the list of functions in the present environment:

quote () 
{ 
    local quoted=${1//\'/\'\\\'\'};
    printf "'%s'" "$quoted"
}
quote_readline () 
{ 
    local ret;
    _quote_readline_by_ref "$1" ret;
    printf %s "$ret"
}

Finally, declare -f quote outputs the function definition for the quote function.

quote () 
{ 
    local quoted=${1//\'/\'\\\'\'};
    printf "'%s'" "$quoted"
}

Perhaps the underlying machinery can be repurposed to meet your needs.

user2514157
  • 545
  • 6
  • 24
0

You should use something like clang which will actually parse your source code and allows you to analyse it. So it can find functions in many languages, and even if you consider macros. You have no chance using regular expressions.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
0

I had a similar need, to pull out a function from C code, I found vim (the editor) to be suited for my needs (and a bit easier) because I don't have to write any external tools or rely on unreliable regexes which can get tedious.

test code:

$ cat -n c.c
   1 #include <stdio.h>
   2 static int
   3 testme (void)
   4 {
   5     int i=1;
   6 
   7     if (i == 1) {
   8           printf("\nDo something\n");
   9     }
  10     return 0;
  11 }
  12 
  13 int main (int argc, char *argv[])
  14 {
  15     testme();
  16     return 0;
  17 } 

Using vim in non-interactive (ex) mode with -es:

step.1 - go to the start of the function with vim search (assuming function name is at the start of the line followed by a space - +/<function-name> and print the line number - !echo line(".").

step.2 - move to the next closing brace at the start of line - +/} and print the line number

step.3 - exit file - +q

step.4 - Now that we have a start-line# and end-line# - we will pipe it to sed in the form <start>,<end>p (a little bit of massaging with paste required, before invoking sed) to dump the entire function.

Full command:

$ vim -es c.c +/'testme ' +'exec(":!echo ".line("."))'  +'/^}'  +'exec(":!echo ".line("."))'  +q | paste -sd "," - | xargs -i{} sed -n {}p c.c
testme (void)
{
    int i=1;

    if (i == 1) {
          printf("\nDo something\n");
    }
    return 0;
}

Ani
  • 1,448
  • 1
  • 16
  • 38