1

i have many .txt files(namely 1.txt, 2.txt 3.txt ...etc) saved in a directory and I want to check if the content of the files inside the directory are same or not

All files should be compared with other file if the content is same then print yes, if content not same print no

For Example: 1.txt

a
b
c

2.txt

a
b
c

3.txt

1
2
3


expected output when compare two file 1.txt 2.txt

1.txt 2.txt yes

expected output when compare two file 1.txt 3.txt
1.txt 3.txt no

expected output when compare two file 2.txt 3.txt
    2.txt 3.txt no

I tried the script

#!/bin/sh
for file in /home/nir/dat/*.txt
do
echo $file
diff $file $file+1
done

But here problem is it doesnot give the output.Please suggest a better solution thanks.

manas
  • 479
  • 2
  • 14
  • 1
    There are a lot of questions like this, just search this forum, A utility that does that is `fdupes` for example. – Jetchisel Jul 10 '21 at 12:07
  • What is `$file+1` supposed to mean? – Barmar Jul 10 '21 at 12:20
  • 1
    first check what you get with `$file+1`. I get filename like `data.txt+1` – furas Jul 10 '21 at 12:21
  • 2
    Instead of comparing all the files, I suggest creating a list of MD5 checksums. Then test if there are any duplicates. – Barmar Jul 10 '21 at 12:23
  • 1
    `md5sum *.txt | cut -d' ' -f1 | sort -u | uniq -d` – Barmar Jul 10 '21 at 12:31
  • That means there are no duplicates. – Barmar Jul 10 '21 at 12:39
  • Does this answer your question? [How do I find duplicate files by comparing them by size (ie: not hashing) in bash](https://stackoverflow.com/questions/61584817/how-do-i-find-duplicate-files-by-comparing-them-by-size-ie-not-hashing-in-bas) – Jetchisel Jul 10 '21 at 13:01
  • What is the expected output? Lets say you have files `a`, `b` and `c` and `a` and `b` have the same contents but `b` and `c` differ. Please edit the expected output to your question, do not post it as a comment. – James Brown Jul 10 '21 at 13:06
  • So basically compare every file with every other file. – James Brown Jul 10 '21 at 13:16

2 Answers2

1

Something like this in bash:

for i in *
do 
  for j in *
  do 
    if [[ "$i" < "$j" ]]
    then 
      if cmp -s "$i" "$j"
      then 
        echo $i $j equal
      else 
        echo $i $j differ
      fi
    fi
  done
done 

Output:

1.txt 2.txt equal
1.txt 3.txt differ
2.txt 3.txt differ
James Brown
  • 36,089
  • 7
  • 43
  • 59
1

One idea using an array of the filenames, and borrowing jamesbrown's cmp solution:

# load list of files into array flist[]

flist=(*)

# iterate through all combinations; '${#flist[@]}' ==> number of elements in array

for ((i=0; i<${#flist[@]}; i++))
do
    for ((j=i+1; j<${#flist[@]}; j++))
    do
         # default status = "no" (ie, are files the same?)

         status=no

         # if files are different this generates a return code of 1 (aka true),
         # so the follow-on assignment (status=yes) is executed

         cmp -s "${flist[${i}]}" "${flist[${j}]}" && status=yes

         echo "${flist[${i}]} ${flist[${j}]} ${status}"
    done
done

For the 3 files listed in the question this generates:

1.txt 2.txt yes
1.txt 3.txt no
2.txt 3.txt no
markp-fuso
  • 28,790
  • 4
  • 16
  • 36