3

How to extract headers from a c file that contains them like this?

#include <tema4header9.h>
#include    <tema4header3.h>
#include   <stdio.h>
#include        <longnametest/newheader.h>
#include <net/header.h>
#include  "last-test-Zhy3/DrRuheader.h"
#include <last-test-8fF7/a5xyheader.h>

I tried to use:

sed -n -e 's/#include[ \t]*[<"]\([^ \/<"]*\/[^ \/]*\)\.h[">]/\1\.h/p'

but it only works for those in subdirectories. also if i type:

sed -n -e 's/#include[ \t]*[<"]\(([^ \/<"]*\/)+[^ \/]*\)\.h[">]/\1\.h/p'

or

sed -n -e 's/#include[ \t]*[<"]\(([^ \/<"]*\/)*[^ \/]*\)\.h[">]/\1\.h/p'

the command does not work anymore. The output file should look like this:

tema4header9.h
tema4header3.
stdio.h
longnametest/newheader.h
net/header.h
last-test-Zhy3/DrRuheader.h
last-test-8fF7/a5xyheader.h
Cœur
  • 37,241
  • 25
  • 195
  • 267
adikinzor
  • 55
  • 1
  • 5

4 Answers4

2

grep solution: This is using perl regex and printing anything between "<" or '"' on the lines which start with #include.

grep -oP '^#include.*(<|")\K.*(?=>|")' headers
tema4header9.h
tema4header3.h
stdio.h
longnametest/newheader.h
net/header.h
last-test-Zhy3/DrRuheader.h
last-test-8fF7/a5xyheader.h

If you are ok with awk:

awk '/#include/{gsub(/<|>|"/,"",$2);print $2}' headers
tema4header9.h
tema4header3.h
stdio.h
longnametest/newheader.h
net/header.h
last-test-Zhy3/DrRuheader.h
last-test-8fF7/a5xyheader.h
P....
  • 17,421
  • 2
  • 32
  • 52
  • Just `'(?<="|\<).*(?="|\>)'` is suffiicient for `grep` – Inian Dec 23 '16 at 09:33
  • @Inian than it might extract some data from undesired line in the `file.c` like `cout << "hey there" << "x>y" < – P.... Dec 23 '16 at 09:39
  • I meant for the part to extraction within quotes and `<` alone,agree that previous part is necessary. – Inian Dec 23 '16 at 09:41
  • Can anybody tell me how to write the above grep like regex for C++ regex_search ? – Aditya kumar May 30 '20 at 19:07
  • Warning: these patterns do not capture all `#include`. Spacing and tabs are allowed around the `#` in C/C++. E.g. `# include `. Correct: `grep -oE '^[ \t]*#[ \t]*include[ \t]*(<[^<]*>|"[^"]*")' headers` – Dr. Alex RE Aug 17 '21 at 14:01
1

This should work:

sed -nr 's/#include\s+[<"]([^>"]+)[>"].*/\1/p'
klarsen
  • 11
  • 3
0

Try:

awk '{match($0,/[<"].*[>"]/);print substr($0,RSTART+1,RLENGTH-2)}' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

Like above:

sed -n 's/\s*#\s*include\s*[<"]\(.\+.h\)[>"]/\1/p' input_file

but it is more precise, for example, the input_file content is:

 # include <stdio.h>
       #        include<stdlib.h>
    #    include    <time.h>
 #define LEN 8
 #define OPT 2
 #include <pthread.h>
 # include "mysql.h"
 #include "paths.h"

it can still print right:

stdio.h
stdlib.h
time.h
pthread.h
mysql.h
paths.h
Li-Guangda
  • 341
  • 1
  • 4
  • 14