0

I need to parse a large C source code to extract all structure definitions, typical format is

typedef struct structure1 {
field1;
field2;
.....
structure2 new_strut;
};

struct structure2 {
field3;
}my_struct;

How can I extract these structures?

marc
  • 949
  • 14
  • 33
  • ...why would you assume that grep or sed are the right tool for the job? – Charles Duffy Oct 24 '16 at 23:10
  • (grep is quite certainly the wrong tool; sed *could* be used, but I'd certainly far rather use awk -- or just native bash, which is adequate to the task without any external tools whatsoever). – Charles Duffy Oct 24 '16 at 23:11
  • edited the question to include 'awk' – marc Oct 24 '16 at 23:13
  • 1
    Again, why are you listing specific tools and limiting your answer to them? That's a bigger problem than just awk being missing. If you want to know how to do X, ask how to do X, not how to do X in a way according to your preconceptions of which tools you might use for the job. – Charles Duffy Oct 24 '16 at 23:13
  • (Is the real constraint "using only standard UNIX tools"? Then ask it with that precise constraint; there might be another standard UNIX tool useful for the job you don't already know about). – Charles Duffy Oct 24 '16 at 23:16
  • if the structures are all separated by newlines as shown, `awk -v RS= -v ORS="\n\n" '/( |^)struct /' file`... `-v ORS="\n\n"` can be removed if you do not need the structures separated by newline – Sundeep Oct 25 '16 at 01:28

1 Answers1

2

awk is a fairly good fit for the job:

awk '
  BEGIN { in_struct=0; }
  /^(typedef )?struct .*/ { in_struct=1; }
  /^}/ && in_struct { print; in_struct=0; }
  in_struct == 1 { print; }
'

However, you could also do it in native bash with no external tools whatsoever:

#!/bin/bash
#      ^^^^- bash, not /bin/sh

struct_start_re='^(typedef )?struct '
struct_end_re='^}'

filter_for_structs() {
  in_struct=0
  while IFS= read -r line; do
    [[ $line =~ $struct_start_re ]] && in_struct=1
    if (( in_struct )); then
      printf '%s\n' "$line"
      [[ $line =~ $struct_end_re ]] && in_struct=0
    fi
  done
}

...used akin to the following:

cat *.[ch] | filter_for_structs
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Right and to get it to work with files other than the "typical format" (e.g. any that have comments or slightly different white space in them!) you'd need to strip comments using `gcc` or similar and normalize the code layout using `indent` or similar first, e.g. `sed 's/a/aA/g; s/__/aB/g; s/#/aC/g' file.c | gcc -P -E - | sed 's/aC/#/g; s/aB/__/g; s/aA/a/g' | indent - | awk 'above script'`. See http://stackoverflow.com/a/35708616/1745001 for what the seds are doing. – Ed Morton Oct 25 '16 at 00:42