-2

For example let's say i have this code

#include <stdio.h>

int main(void)
{
    int x = 99;
    int *p1, *p2;
    char y[10] = "a"
    // some code
    // some code

    return 0;

}

So in This Code, there are two Variables primary of type int and char . Now, Let's say i make a code of atleast 400 - 500 lines in which i have initialised 20-30 variable. Now, i want to make a bash script to find what are the variables in this code. I begin with this way cat code.c | grep int this would print int main void() Function in the above given scenario and also print int x =99 and //some code statement, But i want to strictly print those lines int x=99; and int *p1, *p2 Not Every line.

So now, here in above question i have a Variable data-type char too, Now How do i construct my Bash script in a Way that after cat i grep only particular data-type like int char double float, Is there any OR Method for it? Or are there any easier way than cat and Grep to perform the same??

So my Final output shall be int x=99; int *p1, *p2 , char y[10] = "a" :)

Would be looking forward to the responses.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
Gerorge Timber
  • 219
  • 1
  • 3
  • 12
  • 1
    That is going to be really hard to do with a single regular expression. I mean ***really*** hard. It might actually be easier to write a program that recognizes generic C variable declarations, for example by using `lex` and `yacc`. – Some programmer dude Aug 13 '16 at 07:08
  • 1
    `bash`, `grep`, `sed`, `awk` and alike are not the right tools to do this. A programming language (with a few exotic exceptions) has a complex lexical and grammatical structure, which complete description frequently requires tens of pages of dedicated languages like `lex` and `yacc`, as mentioned by Joachim. Trying to do the same with regular expressions is just a bad idea. – Renaud Pacalet Aug 13 '16 at 07:13
  • That's what i thought initially @JoachimPileborg . :( But could it be possible to grep `int` `char` and some other choosen datatypes alone by Using it as OR Method in grep ? For example i did `cat file.c | grep int` gives me the output for lines containing `int` but what if i want to print `char` data-type too, how shall i proceed :) – Gerorge Timber Aug 13 '16 at 07:14
  • Great @RenaudPacalet , I am not experienced with `lex` and `yacc` yet, But shall look forward to it now to process with this ! – Gerorge Timber Aug 13 '16 at 07:15
  • 2
    You can use `ctags` for this job – qrdl Aug 13 '16 at 07:20
  • Could this maybe help? http://stackoverflow.com/questions/6261392/printing-all-global-variables-local-variables – Ely Aug 13 '16 at 07:42
  • Also: What do you want to do if an `int` or `char` is part of a structure or union? – Ely Aug 13 '16 at 07:44
  • No only the variables which are initialized @Elyasin . But if you have a solution for a Structure or Union also than do share :) – Gerorge Timber Aug 13 '16 at 07:49
  • 1
    The C language is complex. It is impossible – read this again: *impossible* – to catch every syntactically correct variable declaration / assignment by using a regular expression. – Ingo Bürk Aug 13 '16 at 08:39
  • @IngoBürk OP wants *int*, *char* and *double* declarations, so it'll be hard and lengthy to do it with a regex but not impossible (like avoiding function declarations and string input). – Déjà vu Aug 13 '16 at 09:17
  • 1
    Declarations could be split across lines, have comments in between, use macros,... – Ingo Bürk Aug 13 '16 at 10:23
  • Good, Could the Down-voter present a reason here, I have opened that Question category for Answer using Any method. I hope this question isn't too bad – Gerorge Timber Aug 13 '16 at 12:24
  • You can't, you need a language parser like [cscope](http://cscope.sourceforge.net/). That's why such tools exist. – Ed Morton Aug 13 '16 at 17:24

1 Answers1

1

You can take a approach with grep to find lines that begin with whitespace followed by int or char using a basic regular expression with something as simple as:

$ grep '^[ ][ ]*\(int\|char\)' yourfile.c
int x = 99;
int *p1, *p2;
char y[10] = "a"

If you have initial tab characters, (or mixed spaces and tabs), you can use a Perl Compatible Regular Expression:

$ grep -P '^[ \t]+(int|char)' yourfile.c
int x = 99;
int *p1, *p2;
char y[10] = "a"

I don't know if this is exactly what you are looking for, and if not, let me know. There are other c-code scanners that will pull variables, functions, etc out and summarize, e.g. cproto is one I have used and liked a lot. There are a number of others on sourceforge you may want to check as well.

Explanation (from comment)

Given the example, it was clear that only int, char, etc.. were wanted AFTER whitespace (e.g. to miss the int main () declaration). With that in mind we set up grep with the basic regular expression to require at least 1 (space using BRE) or if mixed tab and space are possible (using PCRE) before the search term. To handle spaces only, with a BRE:

grep '^[ ][ ]*

or if dealing with mixed spaces and tabs, a PCRE of

grep -P `^[ \t]+

Which anchors the search ^ at the beginning of the line and, with the BRE looks for a space with the character class [ ]. To handle at least one space, and for zero or more additional spaces, we add an additional [ ]*.

Where multiple (or mixed) spaces and tabs are involved (which cannot be handled easily with BRE), the PCRE essentially does the same thing for both space and tab, while sacrificing some portability for the expanded expression handling provided by PCRE. ^[ \t]+ will accommodate one or more instances of space or tab characters before the search terms. The + requiring the presence of at least 1 or more matches of the characters within the character class.

When looking for either int or char, the basic regular expression format is

\(int\|char\)

where ( and | do not retain special meaning in a BRE and must be escaped while the PCRE form is simply:

(int|char)

So putting it altogether it simply anchors a search for one-or-more space (or using PCRE mixed space and tab) characters before either int or char in making the determination of which lines to display.

Hope that helped.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Nice. Could you explain the regex portion in the Answer please. Because if i use the same into any other C Code it doesn't gives an output, i think it is checking based on `\t` . – Gerorge Timber Aug 13 '16 at 07:52
  • Would work for a very specific set, but functions parameters declarations, and strings ("drive a char today" (from a Canadian fellow))... – Déjà vu Aug 13 '16 at 09:21
  • Yes, this is for a limited set. For larger general source parsing, I have found `cproto` provides a good tool for collecting all the *function*, *variable*, *#define*, etc.. information from a collection of source files. IIRC, it was a package that isn't actively developed, but the sources are still available. The Doxygen suite of documentation tools does a good job as well, but I found it a bit of an overkill for my needs. – David C. Rankin Aug 13 '16 at 09:26
  • 1
    Which program converts the `[ \t]` into a blank-tab character class? It's not the shell; it's inside single quotes. I don't think that `grep` does such changes, either. – Jonathan Leffler Aug 13 '16 at 15:56
  • A magic shell - blew right by the `\t` interpretation with *BRE*, added a *PCRE* example and cleaned up the *BRE*, thanks. – David C. Rankin Aug 14 '16 at 03:54