0

I have two files below which contain each line an ID. However, one of the files contains two IDs less.

$> grep ">" output.racon-1.fasta | wc -l
6492
$ grep ">" output.racon-2.fasta | wc -l
6490

How is possible which two IDs are missing?

FILE 1

$ grep ">" output.racon-1.fasta | head
>utg000001l
>utg000002l
>utg000003l
>utg000004l
>utg000005l
>utg000006l
>utg000007l
>utg000008l
>utg000009l
>utg000010l

$ grep ">" output.racon-1.fasta | tail
>utg006483l
>utg006484l
>utg006485l
>utg006486l
>utg006487l
>utg006488l
>utg006489l
>utg006490l
>utg006491l
>utg006492l

FILE 2

$ grep ">" output.racon-2.fasta | head
>utg000001l
>utg000002l
>utg000003l
>utg000004l
>utg000005l
>utg000006l
>utg000007l
>utg000008l
>utg000009l
>utg000010l

$ grep ">" output.racon-2.fasta | tail
>utg006483l
>utg006484l
>utg006485l
>utg006486l
>utg006487l
>utg006488l
>utg006489l
>utg006490l
>utg006491l
>utg006492l

Thank you in advance,

accdias
  • 5,160
  • 3
  • 19
  • 31
user977828
  • 7,259
  • 16
  • 66
  • 117

2 Answers2

0

A simple diff with sort could do the job :

diff <(grep ">" output.racon-1.fasta | sort) <(grep ">" output.racon-2.fasta | sort)
nullPointer
  • 4,419
  • 1
  • 15
  • 27
0

As an alternative to using diff you can consider using join. If the files are sorted, this can tell you: (without options) the lines they have in common; using -v1 the lines the first file has that are not present in the second file; using -v2 the lines that are only present in the second file.

So, in your instance, if you believe that the second file is a subset of the first file, you could retrieve the addition lines in the first file with

join -v1 <(grep ">" output.racon-1.fasta) <(grep ">" output.racon-2.fasta)

or (if the files are not sorted already)

join -v1 <(grep ">" output.racon-1.fasta | sort) <(grep ">" output.racon-2.fasta | sort)

[We're using process substitution (the <(...) expressions) to feed the results of your grep commands to join.]

Note however, that if the second file is not a subset of the first, you'll either want to examine the output of the equivalent -v2 lines or take the information from diff.

borrible
  • 17,120
  • 7
  • 53
  • 75