-1



Right now I need to create a bash script that would print out the missing sequence and the filename that is missing by comparing two files.

For example, I have a File A that contains items below
ABC12.001
ABC12.002
ABC12.004
ABC12.006
ABC12.007
Another File called File B containing items below

ABC12.001
ABC12.002
ABC12.004
ABC12.006
I want to have an output that would print something like
"Sequence ABC.007 is missing from File B"
How should I approach this? I'd like to provide some of the codings that I tried but so far I wasn't able to produce anything useful.

samone
  • 21
  • 4

1 Answers1

0

like this ?

line_a=`wc -l A|awk '{print $1}'`;line_b=`wc -l B|awk '{print $1}'`;sum_line=`expr $line_a + $line_b`;grep -qvf B A && grep -vf B A|xargs -n $sum_line|awk '{print "Sequence "$0" is missing from File B"}';grep -qvf A B && grep -vf A B|xargs -n $sum_line|awk '{print "Sequence "$0" is missing from File A"}'

or use the bash script like this:

#!/bin/bash
A=$1
B=$2
line_a=$(grep -c ^ $A)
line_b=$(grep -c ^ $B)
sum_line=`expr $line_a + $line_b`
grep -qvf $B $A && grep -vf $B $A|xargs -n $sum_line|awk -v vB="$B" '{print "Sequence "$0" is missing from File "vB}'
grep -qvf $A $B && grep -vf $A $B|xargs -n $sum_line|awk -v vA="$A" '{print "Sequence "$0" is missing from File "vA}'

enter image description here

Victor Lee
  • 2,467
  • 3
  • 19
  • 37
  • This assumes that each line is unique and that order is unimportant, which means you could reduce this to just `sort + comm` or even `sort | uniq` – tripleee Mar 30 '21 at 13:27
  • This suggested solution was down voted - - shame reasons are not required. I would suggest a slight performance enhancer - replace wc -l $(grep -c ^ A ... – irnerd Mar 30 '21 at 13:42