1

I have 2 files for example :

file 1:

1 azer 4
2 toto 0
3 blabla 8
4 riri 9
5 coco 2

file 2:

1 azer 4
2 toto 0
3 blabla 8

I want to compare the two files, and if the lines in the file 2 are in the file 1, I want to remove those lines from the file 1. For example :

Output:

4 riri 9
5 coco 2

I tried this command but it show me only the similarities :

awk 'NR==FNR{a[$2];next} $1 in a {print $0}' merge genotype.txt

Does any one know how to do this ? I tried it in awk but if it's possible to do this in R or python it's good too.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
Erika
  • 69
  • 8
  • 2
    Why is this tagged with R? If you need R solution see this post: [How to join (merge) data frames (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871) – zx8754 Feb 28 '17 at 10:00
  • Would diff --suppress-common-lines suffice your needs? – MKesper Feb 28 '17 at 10:08
  • 1
    Possible duplicate, see [here](http://stackoverflow.com/questions/5812756) and [here](http://stackoverflow.com/questions/4366533) – zx8754 Feb 28 '17 at 10:08
  • Thank you i'm gonna look those post – Erika Feb 28 '17 at 10:57

3 Answers3

2

A much simpler solution in grep-

$cat file1
1 azer 4
2 toto 0
3 blabla 8
4 riri 9
5 coco 2

$cat file2
1 azer 4
2 toto 0
3 blabla 8

Try-

grep -vf file2 file1

Output-

4 riri 9
5 coco 2
Chem-man17
  • 1,700
  • 1
  • 12
  • 27
1

First, read file 2 lines as a set so testing is faster. Then iterate through lines of file 1 and write output file lines using a generator comprehension.

with open("file2.txt") as f: file2 = set(f)

with open("file1.txt") as fr, open("file3.txt","w") as fw:
    fw.writelines(l for l in fr if l not in file2)
  • order preserved
  • fast testing
  • file 1 is never read fully in memory, but the chain of iterators read/write the files line by line
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1
# awk
awk 'FNR==NR{a[$0];next}!($0 in a)' file2 file1

# comm
comm -23 file1 file2

# grep 
grep -Fvxf file2 file1

Input

$ cat file1
1 azer 4
2 toto 0
3 blabla 8
4 riri 9
5 coco 2

$ cat file2
1 azer 4
2 toto 0
3 blabla 8

Output

$ awk 'FNR==NR{a[$0];next}!($0 in a)' file2 file1
4 riri 9
5 coco 2

$ comm -23 file1 file2
4 riri 9
5 coco 2

$ grep -Fvxf file2 file1
4 riri 9
5 coco 2
Akshay Hegde
  • 16,536
  • 2
  • 22
  • 36