6

Problem:

  1. Need to compare two files,
  2. removing the duplicate from the first file
  3. then appending the lines of file1 to file2

Illustration by example

Suppose, The two files are test1 and test2.

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6

And test1 is

$ cat test1
www.xyz.com/abc-1
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5

Comparing test1 to test2 and removing duplicates from test 1

Result Required:

$ cat test1
www.xyz.com/abc-1

and then adding this test1 data in to test2

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
www.xyz.com/abc-1

Solutions Tried:

join -v1 -v2 <(sort test1) <(sort test2)

which resulted into this (that was wrong output)

$ join -v1 -v2 <(sort test1) <(sort test2)
www.xyz.com/abc-1
www.xyz.com/abc-6

Another solution i tried was :

fgrep -vf test1 test2

which resulted nothing.

Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
Ankit Jain
  • 133
  • 1
  • 2
  • 15
  • Does this answer your question? [Deleting lines from one file which are in another file](https://stackoverflow.com/questions/4780203/deleting-lines-from-one-file-which-are-in-another-file) – Pound Hash Oct 17 '22 at 23:42

4 Answers4

11

Remove lines from test1 because they are in test2:

$ grep -vxFf test2 test1
www.xyz.com/abc-1

To overwrite test1:

grep -vxFf test2 test1 >test1.tmp && mv test1.tmp test1

To append the new test1 to the end of test2:

cat test1 >>test2

The grep options

grep normally prints matching lines. -v tells grep to do the reverse: it prints only lines that do not match

-x tells grep to do whole-line matches.

-F tells grep that we are using fixed strings, not regular expressions.

-f test2 tells grep to read those fixed strings, one per line, from file test2.

John1024
  • 109,961
  • 14
  • 137
  • 171
8

With awk:

% awk 'NR == FNR{ a[$0] = 1;next } !a[$0]' test2 test1
www.xyz.com/abc-1

Breakdown:

NR == FNR { # Run for test2 only
  a[$0] = 1 # Store whole line as key in associative array
  next      # Skip next block
}
!a[$0]      # Print line from test1 that are not in a
Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
2

Solution to 1 and 2 problem.

diff test1 test2 |grep "<"|sed  's/< \+//g' > test1.tmp|mv test1.tmp test1

here is the output

$ cat test1
www.xyz.com/abc-1

solution to 3 problem.

cat test1 >> test2

here is the output

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
www.xyz.com/abc-1
sumitya
  • 2,631
  • 1
  • 19
  • 32
  • `$ cat test1` output is `< www.xyz.com/abc-1 ` why this `<` ? – Ankit Jain May 29 '16 at 04:57
  • I have test this in bash, which SHELL you are using? `sed 's/< \+//g'` is handling it already. Please make sure to maintain the mentioned sequence of files in `diff` command. – sumitya May 29 '16 at 05:51
0

If the lines in each file are unique as shown in your sample input then, since you are already sorting the input files in your attempted solutions so sorted output must be OK, this is all you need:

$ sort -u test1 test2
www.xyz.com/abc-1
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6

If you need something else then edit your question to clarify your requirements and provide sample input/output that would cause this to break.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • I guess you didnt read the question properly. I want to remove the duplicates from the test1 file and then appending that to test2 file. – Ankit Jain May 29 '16 at 04:52
  • I read it perfectly but many times people ask for A when they actually want B and your question sounds like you are describing what you think are the steps required to solve a problem, not the problem itself. Why do you care where the lines from each file end up as long as the result is the unique set of lines from both files? – Ed Morton May 29 '16 at 13:13