1

I just started learning Linux shell scripting. I have to compare this two files in Linux shell scripting for version control example :

file1.txt

  • 275caa62391ff4f3096b1e8a4975de40 apple
  • awd6s54g64h6se4h6se45wahae654j6 ball
  • e4rby1s6y4653a46h153a41bqwa54tvi cat
  • r53aghe4354hr35a4hr65a46eeh5j45ro castor

file2.txt

  • 275caa62391ff4f3096b1e8a4975de40 apple
  • js65fg4a64zgr65f4w65ea465fa65gh7 ball
  • wroghah4a65ejdtse5z4g6sa7H658aw7 candle
  • wagjh54hr5ae454zrwrh354aha4564re castor

How to sort this text files in newly added(one which is added in file 2 but not in file 1) ,deleted(one which is deleted in file 2 but not in file 1) and changed files (have same name but different checksum) ? I tried using diff , bcompare , vimdiff but I am not getting a proper output as a text file.

Thanks in advance

karkator
  • 51
  • 1
  • 10

2 Answers2

0

I don't know if such a command exist, but I've taken the liberty to write you a sorting mechanism in Bash. Although it's optimised, I suggest you recreate it in a language of your own choice.

#! /bin/bash

# Sets the array delimiter to a newline
IFS=$'\n'

# If $1 is empty, default to 'file1.txt'. Same for $2.
FILE1=${1:-file1.txt}
FILE2=${2:-file2.txt}

DELETED=()
ADDED=()
CHANGED=()

# Loop over array $1 and print content
function array_print {
        # -n creates a "pointer" to an array. This
        # way you can pass large arrays to functions.
        local -n array=$1
        echo "$1: "

        for i in "${array}"; do
                echo $i
        done
}

# This function loops over the entries in file_in and checks
# if they exist in file_tst. Unless doubles are found, a
# callback is executed.
function array_sort {
        local file_in="$1"
        local file_tst="$2"
        local callback=${3:-true}
        local -n arr0=$4
        local -n arr1=$5

        while read -r line; do

                tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
                tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
                hit=$(grep $tst_name $file_tst)

                # If found, skip. Nothing is changed.
                [[ $hit != $line ]] || continue

                # Run callback
                $callback "$hit" "$line" arr0 arr1

        done < "$file_in"
}

# If tst is empty, line will be added to not_found. For file 1 this 
# means that file doesn't exist in file2, thus is deleted. Otherwise
# the file is changed.
function callback_file1 {
        local tst=$1
        local line=$2
        local -n not_found=$3
        local -n found=$4

        if [[ -z $tst ]]; then
                not_found+=($line)
        else
                found+=($line)
        fi
}

# If tst is empty, line will be added to not_found. For file 2 this
# means that file doesn't exist in file1, thus is added. Since the 
# callback for file 1 already filled all the changed files, we do 
# nothing with the fourth parameter.
function callback_file2 {
        local tst=$1
        local line=$2
        local -n not_found=$3

        if [[ -z $tst ]]; then
                not_found+=($line)
        fi
}

array_sort "$FILE1" "$FILE2" callback_file1 DELETED CHANGED 
array_sort "$FILE2" "$FILE1" callback_file2 ADDED CHANGED 

array_print ADDED
array_print DELETED
array_print CHANGED
exit 0

Since it might be hard to understand the code above, I've written it out. I hope it helps :-)

while read -r line; do
       tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
       tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
       hit=$(grep $tst_name $FILE2)

       # If found, skip. Nothing is changed.
       [[ $hit != $line ]] || continue

       # If name does not occur, it's deleted (exists in 
       # file1, but not in file2)
       if [[ -z $hit ]]; then
               DELETED+=($line)
       else
       # If name occurs, it's changed. Otherwise it would
       # not come here due to previous if-statement.
               CHANGED+=($line)
       fi
done < "$FILE1"

while read -r line; do
       tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
       tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
       hit=$(grep $tst_name $FILE1)

       # If found, skip. Nothing is changed.
       [[ $hit != $line ]] || continue

       # If name does not occur, it's added. (exists in 
       # file2, but not in file1)
       if [[ -z $hit ]]; then
               ADDED+=($line)
       fi
done < "$FILE2"
Bayou
  • 3,293
  • 1
  • 9
  • 22
  • looks like this bash program compares the data only for four lines , I wanted to compare two text files which has many number of lines. Thank you so much for this code. – karkator Jan 21 '20 at 07:45
  • @karkator Why do you think that? Data is taken from a file, regardless the length of the file. – Bayou Jan 21 '20 at 09:05
0

Files which are only in file1.txt:

 awk 'NR==FNR{a[$2];next} !($2 in a)' file2.txt file1.txt > only_in_file1.txt

Files which are only in file2.txt:

 awk 'NR==FNR{a[$2];next} !($2 in a)' file1.txt file2.txt > only_in_file2.txt

Then something like this answer: awk compare columns from two files, impute values of another column

e.g:

awk 'FNR==NR{a[$1]=$1;next}{print $0,a[$1]?a[$2]:"NA"}' file2.txt file1.txt  | grep NA | awk '{print $1,$2}' > md5sdiffer.txt

You'll need to come up with how you want to present these though.

There might be a more elegant way to loop though the final example (as opposed to finding those with NA and then re-filtering), however it's still enough to go off

bob dylan
  • 1,458
  • 1
  • 14
  • 32