0

Hi guys i'm having an issue while using diff.

In my script i'm trying to compare all files in 1 dir to all files in 2 other dir Using diff to compare is files are the same.

Here is my script : `

#!/bin/bash

files1=()
files2=()

# Directories to compare. Adding quotes at the begining and at the end of each files found in content1 & content3

content2=$(find /data/logs -name "*.log" -type f)
content1=$(find /data/other/logs1 -type f | sed 's/^/"/g' | sed 's/$/"/g')
content3=$(find /data/other/logs2 -type f | sed 's/^/"/g' | sed 's/$/"/g')

# ADDING CONTENT INTO FILES1 & FILES2 ARRAY
while read -r line; do
        files1+=("$line")
done <<< "$content1"

# content1 and content3 goes into the same array
while read -r line3;do
        files1+=("$line3")
done <<< "$content3"

while read -r line2; do
        files2+=("$line2")
done <<< "$content2"


# Here i'm trying to compare 1 by 1 the files in files2 to all files1
for ((i=0; i<${#files2[@]}; i++))
do
        for ((j=0; j<${#files1[@]}; j++))
        do
                if [[ -n ${files2[$i]} ]];then
                        diff -s "${files2[$i]}" "${files1[$j]}" > /dev/null
                        if [[ $? == 0 ]]; then
                                echo ${files1[$j]} "est identique a" ${files2[$i]}
                                unset 'files2[$i]'
                                break
                        fi
                fi
        done
done

#SHOW THE FILES WHO DIDN'T MATCHED
echo ${files2[@]}

`

I'm having the folling issue when i'm trying to diff : diff: "/data/content3/other/log2/perso log/somelog.log": No such file or directory

But when i'm doing

ll "/data/content3/other/log2/perso log/somelog.log" -rw-rw-r-- 2 lopom lopom 551M 30 oct. 18:53 '/data/content3/other/logs2/perso log/somelog.log'

So the file exist.

i need those quotes because sometimes there are space in the path

Does some1 know how to fix that ?

Thanks.

I already tried to change the quotes by single quotes, but it didn't fixed it

  • 1
    `readarray -d '' files1 < <(find /data/logs -name '*.log' -type f -print0)` -- all the muss with first creating strings mushing your filenames together, adding quotes _within_ the strings and then trying to parse that content into arrays is just adding extra failure modes. – Charles Duffy Dec 07 '22 at 21:06
  • 1
    Remember, filenames can contain quotes. Filenames can contain newlines. Filenames can contain wildcards. Filenames can contain binary data, EXCEPT for NULs -- which is why the NUL _and no other character_ is safe to use to separate lists of paths (the other character that can't exist in an individual filename is `/`, but that very much does exist in paths). When you try to store a list of filenames inside a single string, you're trusting those names to comply with a set of assumed rules, and setting yourself up for trouble when they don't fit those rules. – Charles Duffy Dec 07 '22 at 21:07

1 Answers1

0

First, don't do this -

content2=$(find /data/logs -name "*.log" -type f)
content1=$(find /data/other/logs1 -type f | sed 's/^/"/g' | sed 's/$/"/g')
content3=$(find /data/other/logs2 -type f | sed 's/^/"/g' | sed 's/$/"/g')

don't stack all these into single vars. This is asking for ten kinds of obscure trouble. More importantly, those sed calls are embedding the quotation marks into the data as part of the filenames, which is probably what's causing diff to crash, because there are no actual files with the quotes in the name.

Also, if you are throwing away the output and just using diff to check the files are identical, try cmp instead. The -s is silent, and it's a lot faster since it exits at the first differing byte without reading the rest of both files and generating a report. If there ae a lot of files, this will add up.

If the logs are the only things in the directories, and you don't have to scan subdirectoies, and the filename can't appear in both /data/other/logs1 AND /data/other/logs2, but you're pretty sure it will be in at least one of them... then simplify:

for f in /data/logs/*.log                     # I'll assume these are all files...
do  t=/data/other/logs[12]/"${f#/data/logs/}" # always just one?
    if cmp -s "$f" "$t"                       # cmp -s *has* no output
    then echo "$t est identique a $f"         # files are same
    elif [[ -e "$t" ]]                        # check t exists
    then echo "$t diffère de $f"              # maybe ls -l "$f" "$t" ?  
    else echo "$t n'existe pas"               # report it does not
    fi
done

This needs no arrays, no find, no sed calls, etc.

If you do need to read subdirectories, use shopt to handle it with globs so that you don't have to worry about parsing odd characters with read. (c.f. https://mywiki.wooledge.org/ParsingLs for some reasons.)

shopt -s globstar
for f in /data/logs/**/*.log   # globstar makes ** match at arbitrary depth
do  for t in /data/other/logs[12]/**/"${f#/data/logs/}" # if >1 possible hit
    do  if cmp -s "$f" "$t" 
        then echo "$t est identique a $f"
        elif [[ -e "$t" ]]
        then echo "$t diffère de $f" 
        else echo "$t n'existe pas"  # $t will be the glob, one iteration
        fi
    done
done
Paul Hodges
  • 13,382
  • 1
  • 17
  • 36