0

I'd like to get the missing files that are in file1 but not in file2 within each directory that is similar to ls -sR output. I show below the format of file1 and file2 and to the right, the expected output. I file1 and file2 is present desktop.ini but it has different size in both files, so appears in output. The content of file1 and file2 were obtained in 2 different devices and currently my option is to compare by using these files.

file1.txt                         |  file2.txt                 |  Missing files in file1 but not in file2
==========================================================================================================
    ./AB/FTP:                     |     ./AB/FTP:              |  ./AB/FTP: 
4   FileZilla.lnk                 |  4  FileZilla.lnk          |  desktop.ini "different in size"
7   desktop.ini                   |  1  desktop.ini            |
                                  |                            |  ./BX/MS Office: 
    ./BX/MS Office:               |     ./BX/MS Office:        |  OneNote 2013.lnk
4   Excel 2013.lnk                |  4  Excel 2013.lnk         |  Outlook 2013.lnk
4   OneNote 2013.lnk              |  4  PowerPoint 2013.lnk    |
4   Outlook 2013.lnk              |  4  Word 2013.lnk          |  ./D/R/Web:
4   PowerPoint 2013.lnk           |  1  desktop.ini            |  Google Chrome.lnk 
4   Word 2013.lnk                 |                            |  Internet Explorer.lnk 
1   desktop.ini                   |                            |  desktop.ini 
                                  |                            |
    ./D/R/Web:                    |                            |
4   Google Chrome.lnk             |                            |
4   Internet Explorer.lnk         |                            |
1   desktop.ini                   |                            |
                                 

I've tried with diff but it seems is not the kind of input diff command needs, or I'm not interpreting correctly the output.

$ diff -u file1.txt file2.txt
--- file1.txt      2022-01-22 13:08:54.855275200 -0400
+++ file2.txt      2022-01-22 13:09:05.785816800 -0400
@@ -1,16 +1,9 @@
-       ./AB/FTP:
-4      FileZilla.lnk
-7      desktop.ini
-
-       ./BX/MS Office:
-4      Excel 2013.lnk
-4      OneNote 2013.lnk
-4      Outlook 2013.lnk
-4      PowerPoint 2013.lnk
-4      Word 2013.lnk
-1      desktop.ini
-
-       ./D/R/Web:
-4      Google Chrome.lnk
-4      Internet Explorer.lnk
-1      desktop.ini
\ No newline at end of file
+       ./AB/FTP:
+4      FileZilla.lnk
+1      desktop.ini
+
+       ./BX/MS Office:
+4      Excel 2013.lnk
+4      PowerPoint 2013.lnk
+4      Word 2013.lnk
+1      desktop.ini
\ No newline at end of file

Thanks in advance for any help.

Suspeg
  • 29
  • 6
  • 1
    Is there a reason you want to use `ls` at all, instead of just iterating through the directory contents with globs (if you want bash), or comparing output from `find` (if you want to use traditional text-stream-comparison tools)? [Why you shouldn't parse the output of `ls`](https://mywiki.wooledge.org/ParsingLs) is relevant. – Charles Duffy Jan 22 '22 at 22:06
  • ...as for doing set arithmetic, f/e, finding things in list-A but not list-B, that's what `comm` is for; but the `ls` request throws a wrench, because someone reading this question doesn't know exactly how "similar" output needs to be to be considered responsive to the question. – Charles Duffy Jan 22 '22 at 22:07
  • `I want to` Well, so you can do it manually, you can write a program yourself to do it, you can find a programmer who will do it for you in exchange for money. If you want to write a program, interest in learning Python or Perl or Awk. If you are searching for programmers for hire, try some freelancing site. – KamilCuk Jan 22 '22 at 22:09
  • @Charles Duffy the issue is I don't have the access to directories of file2 that are in another device. I only have access to directories in file1 (machine1). And currently only have the list content for both devices stored in file1 and file2 in order to compare it. – Suspeg Jan 22 '22 at 22:21

1 Answers1

0

diff only gives you line by line comparison, while you are searching for three dimension comparison (directory, size and filename) you need to do it through some dictionaries and loops. If the file lists are too long (it will take long) or you do want to write complex code in other programming language, go for it.

First step, we need to separate data by directories.

declare -a dicts1
while read line; do
if [[ $line =~ ^\.\/.* ]]; then
  currentdict="$(echo $line | sed 's/\./\_/g;s/\//\_/g;s/\s/\_/g;s/\:/\_/g')"
  declare -A $currentdict
  dicts1=("${dicts1[@]}" "$currentdict");
else
    filename=$(echo $line | awk '{print $2}')
  filesize=$(echo $line | awk '{print $1}')
  command="$currentdict[$filename]=$filesize"
  eval $command
fi
done <  $HOME/file1.txt

Now we have associative arrays for each directory in file1, redo the code for file2 and iterate over all dictionaries:

for dictname in ${dicts1[@]};do
  command="echo \${!$dictname[@]}"
  for i in $(eval $command);do
    c2="echo \${$dictname[$i]}"
    size_value=$(eval $c2)
    filename=$i
    #INSERT CONDITIONS HERE...
  done
done

You can then search for exact key-value pair on dictionaries for file2.

By the way, you need to change every special characters in filenames into characters that can be used in variable names. The sample sed command for the dictionary names can help you with that.

acfglmrsx
  • 52
  • 3
  • What's the purpose of the `eval` use here? (See [BashFAQ #48](https://mywiki.wooledge.org/BashFAQ/048) re: why use of `eval` without an obvious, compelling reason to do so tends to be frowned on). And `${dicts1[@]}` acts just like `${dicts1[*]}` unless you quote it as `"${dicts1[@]}"` – Charles Duffy Jan 23 '22 at 17:54
  • Also, `echo $line` has the issues discussed in [I just assigned a variable, but `echo $variable` shows something else!](https://stackoverflow.com/questions/29378566/i-just-assigned-a-variable-but-echo-variable-shows-something-else); better practice to use `printf '%s\n' "$line"` if you want to be certain the variable's content is emitted unmodified. – Charles Duffy Jan 23 '22 at 17:56
  • 1
    ...and there's no _reason_ to `while read line; do firstField=$(echo $line | awk '{print $1}'); secondField=$(echo $line | awk '{print $2}')` when you could just `while read firstField secondField rest; do ...` and let `read` itself separate your line into multiple variables. – Charles Duffy Jan 23 '22 at 17:57
  • (If your purpose in `eval` is to be able to dynamically determine the associative array to assign to, you can do that with namevars, no `eval` required: `declare -g -A "$arrayname"; declare -g -n currentArray="$arrayname"; currentArray[$key]=$value; unset -n currentArray` makes `currentArray` an alias that points to `arrayname` for the duration of the assignment). – Charles Duffy Jan 23 '22 at 17:58
  • Yes eval has the purpose of dynamically determine to associative array and the script may be changed with namevars as you mentioned. But I did not get the evilness of eval use in here, a script that reads files and create dictionaries over them. – acfglmrsx Jan 23 '22 at 18:29
  • Don't understand aiming perfect commands for a simple job like that. As I understand from the question, aim is comparing two different directory contents with limited input modification. If we are aiming professional and all-case proof solution bash script definitely is not an option. [Google bash Script Style Guide](https://google.github.io/styleguide/shellguide.html) has pointed out limitations of bash clearly I guess. – acfglmrsx Jan 23 '22 at 18:38
  • I agree with you only to a point: Robust bash is almost a different language, and it takes skill to write (and not enough people know how to make its use advisable in real-world commercial environments). On the other hand, Stack Overflow is a teaching resource: We should do the work to teach how to do things right, even when it's hard. – Charles Duffy Jan 23 '22 at 18:45
  • 1
    As for `eval`, _handling untrusted inputs_ (like data from a file) is _exactly_ when you don't want to use `eval`. My best war story in terms of a former employer with a data loss event was caused by someone who trusted a filename to contain only hex digits -- trust that seemed well-placed because files in the directory at hand could only be produced by internally-written software -- until there was a buffer overflow in a C library used by some of that internal software and a script tried to delete a filename containing a `*` surrounded by whitespace, destroying their backups used for billing. – Charles Duffy Jan 23 '22 at 18:46
  • Understanded now, will try to rewrite commands without eval and with your suggestions. Thx – acfglmrsx Jan 23 '22 at 18:56
  • Thanks so much for your help and time for your solution. A kind of lost with eval thing and not sure if I can run it. Regards – Suspeg Jan 24 '22 at 05:56