Right now I need to create a bash script that would print out the missing sequence and the filename that is missing by comparing two files.
For example, I have a File A that contains items below
ABC12.001
ABC12.002
ABC12.004
ABC12.006
ABC12.007
Another File called File B containing items below
ABC12.001
ABC12.002
ABC12.004
ABC12.006
I want to have an output that would print something like
"Sequence ABC.007 is missing from File B"
How should I approach this? I'd like to provide some of the codings that I tried but so far I wasn't able to produce anything useful.
Asked
Active
Viewed 412 times
-1

samone
- 21
- 4
-
2`diff file_a file_b` or `comm -3 file_a file_b`? – Cyrus Mar 28 '21 at 19:55
1 Answers
0
like this ?
line_a=`wc -l A|awk '{print $1}'`;line_b=`wc -l B|awk '{print $1}'`;sum_line=`expr $line_a + $line_b`;grep -qvf B A && grep -vf B A|xargs -n $sum_line|awk '{print "Sequence "$0" is missing from File B"}';grep -qvf A B && grep -vf A B|xargs -n $sum_line|awk '{print "Sequence "$0" is missing from File A"}'
or use the bash script like this:
#!/bin/bash
A=$1
B=$2
line_a=$(grep -c ^ $A)
line_b=$(grep -c ^ $B)
sum_line=`expr $line_a + $line_b`
grep -qvf $B $A && grep -vf $B $A|xargs -n $sum_line|awk -v vB="$B" '{print "Sequence "$0" is missing from File "vB}'
grep -qvf $A $B && grep -vf $A $B|xargs -n $sum_line|awk -v vA="$A" '{print "Sequence "$0" is missing from File "vA}'

Victor Lee
- 2,467
- 3
- 19
- 37
-
This assumes that each line is unique and that order is unimportant, which means you could reduce this to just `sort + comm` or even `sort | uniq` – tripleee Mar 30 '21 at 13:27
-
This suggested solution was down voted - - shame reasons are not required. I would suggest a slight performance enhancer - replace wc -l $(grep -c ^ A ... – irnerd Mar 30 '21 at 13:42