2

I want to extract comments and want to know from which functions they are. I have lots of such C files as below:

With input:

void main()
{
    //sdgs
    call A;
    /*
    sdfgs
    dfhdfh
    */
    call b;
    some code;
}

/* this function adds
 something */
int add()
{
    //sgsd
    some code;
    //more comments
    some code;
}

Output should be:

void main()
{
    //sdgs

    /*
    sdfgs
    dfhdfh
    */

}

/* this function adds
 something */
int add()
{
    //sgsd

    //more comments

}

Input code is neatly formatted and 'function code' starts after { at next line. Basically, I just need to know which 'comment' is from which function. Also it should include any other comments above function name or elsewhere. Note: this is different as function names at top level should be there.

To simplify my requirements:

  1. Print all comments
  2. Detect a block containing ( on the first line, with a single line containing only { at first column after one to three lines and print the lines immediately above.
sam-w
  • 7,478
  • 1
  • 47
  • 77
piyush
  • 21
  • 3

3 Answers3

6

This is impossible with regular expressions, you need to write a little C parser.

Why?

First there are macro's that need to be substituted first. Second because function defintions are kind of "hard" to put in a regular expression. Some legal function definitions:

int f() {}
const int f() {}
const char* f(int);
void f(double t);
void f(t,a) int t; int (*a)(float, char, char) {}
orlp
  • 112,504
  • 36
  • 218
  • 315
  • Just out of curiosity, what is: "void f(t,a) int t;" doing? – John Humphreys Nov 08 '11 at 13:22
  • @w00te: It declares a function returning nothing (void) that takes two parameters, an `int` called `t` and a pointer to a function taking a `float, char, char` as arguments and returning an `int` called `a`. I did make a (now edited) mistake, I forgot the return type of the function pointer. Also, this is a combination of old-style function declarations and function pointers. – orlp Nov 08 '11 at 13:23
  • I though the 3rd and 4th examples would be a valid declaration, but not definition. – sidyll Nov 08 '11 at 13:29
  • @sidyll: Sure, but you can easily replace the semicolon with brackets to make them definitions, it was just an example. – orlp Nov 08 '11 at 13:30
  • ok, if line x has { at first column and that's the only thing in that line, then x-l has function name end with ) and goes up till line number (x minus --) till start of ( <==is it still not possible? – piyush Nov 08 '11 at 13:32
  • Although perl regexes can more than "usual" regular expressions (type-3 language) writing a parser is the best way to do that as it gets tricky, when `"` and `'` come into play. You can still use regexes for the lexing part though. – mbx Nov 08 '11 at 13:35
  • I'm still confused, haha. void f(t,a) int t; - how can it take int t as a parameter when int t is outside the parenthesis? And I'm definitely not arguing, I just wanted to understand it in case I saw it again. :) – John Humphreys Nov 08 '11 at 13:39
  • my simplified requirements: 1. print all comments 2. print next lines if it contains '(' & after 1, or 2 or 3 lines, there is line containing only { at first column – piyush Nov 08 '11 at 13:43
  • @w00te: see this: http://stackoverflow.com/questions/3016213/what-is-this-strange-function-definition-syntax-in-c – orlp Nov 08 '11 at 13:43
  • Wow, that's what I get for starting in the C#/C++ era instead of the C era, haha. I've been through a thousand books and never even ran into that. Anyway, thanks for the help, +1! :) – John Humphreys Nov 08 '11 at 13:46
  • @nightcracker not quite, you'd need to name the parameter in the 3rd example :-) just a comment though – sidyll Nov 08 '11 at 15:06
2

It is perhaps not doable in a very general sense (e.g. because functions could be defined by what is apparently a macro invocation).

But if you don't care about perfection, you might make a simple lexer & parser which nearly does the job (on input code which is not too contrived).

I'm not sure it is useful. You should explain what you really want to achieve.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • function name is not in defined thru macro or any other complex way. it's function like unsigned int func_data( arg one, agrg two) { – piyush Nov 08 '11 at 13:20
0

you want to print only comments and function defintions from a well formated file. In my opinion this specific task is doable without a proper parser.

Try:

comments.awk:

/^(int|void|{|}|etc.)/{print $0;next}
/\/\*/,/\*\//{print $0;next}
/\s*\/\//{print $0;next}

call with:

awk -f comments.awk file1
Chris
  • 2,987
  • 2
  • 20
  • 21
  • but it print function name containing int or void. that's not what i want. i think awk/sed being line processor--it cannot be done unless one can see 3 or 4 lines ahead. thanks for ur help though – piyush Nov 08 '11 at 13:57