15

Imagine file 1:

#include "first.h"
#include "second.h"
#include "third.h"

// more code here
...

Imagine file 2:

#include "fifth.h"
#include "second.h"
#include "eigth.h"

// more code here
...

I want to get the headers that are included in file 2, but not in file 1, only those lines. So, when ran, a diff of file 1 and file 2 will produce:

#include "fifth.h"
#include "eigth.h"

I know how to do it in Perl/Python/Ruby, but I'd like to accomplish this without using a different programming language.

B Johnson
  • 2,408
  • 3
  • 20
  • 32
Senthess
  • 17,020
  • 5
  • 23
  • 28
  • 1
    For more ways to do the same thing take a look at this [BashFAQ](http://mywiki.wooledge.org/BashFAQ/036). Keep in mind since all of these solutions do line-based pattern matching, you'll have to make sure you format your include lines the same way everywhere. Examples: `#include` will not match `# include` and `"first.h"` will not match `"../first.h"` from a sub-directory, etc. – jw013 Aug 04 '11 at 08:08
  • possible duplicate of [Remove Lines from File which appear in another File](http://stackoverflow.com/questions/4366533/remove-lines-from-file-which-appear-in-another-file) – Ciro Santilli OurBigBook.com Jun 27 '15 at 08:49

5 Answers5

25

This is a one-liner, but does not preserve the order:

comm -13 <(grep '#include' file1 | sort) <(grep '#include' file2 | sort)

If you need to preserve the order:

awk '
  !/#include/ {next} 
  FILENAME == ARGV[1] {include[$2]=1; next} 
  !($2 in include)
' file1 file2
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • More generalized answer here: http://stackoverflow.com/a/5812853/973402; this solution is WAY faster than grep -f when you have a lot of patterns to check against – Joshua Richardson Jan 07 '14 at 21:55
9

If it's ok to use a temp file, try this:

grep include file1.h > /tmp/x && grep -f /tmp/x -v file2.h | grep include

This

  • extracts all includes from file1.h and writes them to the file /tmp/x
  • uses this file to get all lines from file2.h that are not contained in this list
  • extracts all includes from the remainder of file2.h

It probably doesn't handle differences in whitespace correctly etc, though.

EDIT: to prevent false positives, use a different pattern for the last grep (thanks to jw013 for mentioning this):

grep include file1.h > /tmp/x && grep -f /tmp/x -v file2.h | grep "^#include"
rubo77
  • 19,527
  • 31
  • 134
  • 226
Frank Schmitt
  • 30,195
  • 12
  • 73
  • 107
  • 1
    Maybe change that last grep pattern to `'^#include'` unless you also want to see random lines of code where you happened to use the word "include" – jw013 Aug 04 '11 at 07:53
  • 1
    when greping for matching lines, you should use the options: `-F` for "fixed-string" (non-regexp) patterns, and `-x` for "whole line" matches. Also, the temp file isn't strictly necessary, you can use `-f -` to take the pattern file from standard in. The resulting command becomes: `grep '^#include' file1.h | grep -f - -vFx file2.h | grep '^#include'` – Lee Oct 24 '13 at 00:51
8

This variant requires an fgrep with the -f option. GNU grep (i.e. any Linux system, and then some) should work fine.

# Find occurrences of '#include' in file1.h
fgrep '#include' file1.h |
# Remove any identical lines from file2.h
fgrep -vxf - file2.h |
# Result is all lines not present in file1.h.  Out of those, extract #includes
fgrep '#include'

This does not require any sorting, nor any explicit temporary files. In theory, fgrep -f could use a temporary file behind the scenes, but I believe GNU fgrep doesn't.

tripleee
  • 175,061
  • 34
  • 275
  • 318
6

If the goal need not be accomplished with Bash alone (i.e., use of external programs is acceptable), then use combine from moreutils:

combine file1 not file2 > lines_in_file1_not_in_file2
pmocek
  • 163
  • 1
  • 2
2

cat $file1 $file2 | grep '#include' | sort | uniq -u

plbogen
  • 61
  • 5
  • This will list `#include` lines unique to file1 or file2. I think that you want `cat $file1 $file1 $file2 | grep '#include' | sort | uniq -u`, with file1 repeated so that its `#include` lines are doubled and will then be filtered by the `uniq -u`. – esmit Dec 13 '13 at 00:19
  • And since `grep` can read multiple input files, you can use `grep -h` and do away with the (only moderately useless) `cat`. – tripleee Mar 15 '14 at 12:23