4

hoping someone can help me get my head around this

I have 2 files, one is 325 lines long, one is 361 lines long.

The bulk of these files is identical content but the 2nd one has random extra lines inserted. I am only interested in the extra lines, and I need to preserve the order in which they occur in the file.

The files contain a repeating paragraph of approximately 31 lines - I know the first and last line of this paragraph, and have no problems with dropping the entire paragraph, but can't work out how.

i.e. File1

The quick brown
fox jumped 
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog

i.e. File2

The quick brown
fox jumped
over the
lazy dog
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
The quick brown
fox jumped
over the
lazy dog
djakdjhgmv
asdjkljkgfyiyi
The quick brown
fox jumped
over the
lazy dog
jghytpuptou

I need to output only the extra lines in this order:

sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou

Any help or advice would be gratefully received, I am not a *nix person unfortunately :( I tried a few diff expressions and comm expressions, but can't get what I need.

user1108364
  • 41
  • 1
  • 2

4 Answers4

3

Try this magic command:

diff file1.txt file2.txt | sed -n 's/^> \(.*\)/\1/p'

diff file1.txt file2.txt should output something like

2c2
< fox jumped 
---
> fox jumped
4a5,7
> sadhasdgh
> qyyutrytkdaslksad
> utyiuiytiuyo
8a12,13
> djakdjhgmv
> asdjkljkgfyiyi
12a18
> jghytpuptou

sed -n 's/^> \(.*\)/\1/p' should find lines staring with > and output that lines without >. Possible reason why this doesn't work is different output of diff at your system?

jaypal singh
  • 74,723
  • 23
  • 102
  • 147
alexander
  • 2,703
  • 18
  • 16
  • Couldn't get this to work, but thanks anyway - got no output when I tried even on the files above. – user1108364 Dec 20 '11 at 18:04
  • Worked out why it didn't work for me, it's because my diff commands adds + and -, not < and > for differences - many thanks :) – user1108364 Dec 20 '11 at 18:12
1

This should work -

awk 'NR==FNR{a[$0]++;next} !($0 in a){print $0}' file1 file2

Explaination:

NR and FNR are awk's built-in variables. NR registers the number of records and does not get reset to 0 when working with two files. FNR is similar to NR but gets reset to 0 after the file is completely parsed through.

In this awk one-liner, we keep that condition NR==FNR which is to force action {a[$0]++;next} only on the file1 (as NR==FNR will only be true till we are working with file1). This action stores each line in an array. next is added so that the second action does not get called upon. Once this NR==FNR becomes untrue, the first action is never called. awk moves to the second action which is to check the content of the file2 with respect to the array (i.e file1). If the content of file2 is in the array, we ignore it. If it is not there in the array we print it as those lines would be the ones that are extra and only in file2.

Test:

File1:

[jaypal:~/Temp] cat file1
The quick brown
fox jumped 
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog

File2:

[jaypal:~/Temp] cat file2
The quick brown
fox jumped
over the
lazy dog
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
The quick brown
fox jumped
over the
lazy dog
djakdjhgmv
asdjkljkgfyiyi
The quick brown
fox jumped
over the
lazy dog
jghytpuptou

Execution:

[jaypal:~/Temp] awk 'NR==FNR{a[$0]++;next} !($0 in a){print $0}' file1 file2
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
0

This might work for you (GNU diff):

diff -bu file1 file2 | sed -n '1,2d;s/^+//p'
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou
potong
  • 55,640
  • 6
  • 51
  • 83
0
diff -b sample.log sample.log.1 | awk '/>/ {print $2}'