0

I am using the md5deep utility to compute the hashes for files while recursively digging through a directory structure.

It allows to run command like this -

md5deep -r -l -j0 app

and gives output like this (recursive list of md5 hash of all the underlying files/directories, considering their content) -

d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/controllers/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/models/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/components/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/helpers/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/behaviors/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/groups/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/fixtures/empty

I am further doing an md5sum on the result to produce a hash of the entire codebase -

md5deep -r -l -j0 app | md5sum

Output -

86df91fc29f2891ff0aa7aaa4bd13730  -

Now, I am stuck at excluding some paths (files and directories) from being considered while calculating the final md5sum. E.g. if I want to exclude these two paths - app/tests/groups/empty and app/tests/fixtures/empty.

The md5deep documentation provides an option (-f option) to provide a list of file names/directories in a file, but those files will be included. However, I am looking for the opposite, i.e. to exclude some predefined set of files/directories from the dynamic set of directories (new directories/files could be added in future) inside a given directory.

Solutions using regular expressions or some tool/utility other than md5deep are also welcome, as long as it serves my purpose. I feel a regex solution with grep would be complicated, in the absence of lookaheads. E.g. the following regex is needed just to match any string excluding ABC -

^([^A]|A([^B]|B([^C]|$)|$)|$).*$

https://stackoverflow.com/a/1395247/351903

Community
  • 1
  • 1
Sandeepan Nath
  • 9,966
  • 17
  • 86
  • 144

1 Answers1

2

Why not using find together with md5sum:

find app -type f -exec md5sum {} \;
d41d8cd98f00b204e9800998ecf8427e  app/tests/groups/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/components/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/behaviors/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/models/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/helpers/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/controllers/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/fixtures/empty

If you need to exclude some directory, use the option -path and if you need to exclude filename use -name.

For example if you want to exclude file which would contain models in their pathname, use the following:

find app -type f ! -path "*models*" -exec md5sum {} \;

BTW, if your looking at empty files, you can use the -empty option: find app -empty

oliv
  • 12,690
  • 25
  • 45
  • Looks good. I have just one question. Is there any possibility of individual md5sums being returned in different order? Because this could lead to different results when I pipe the above output to get a final hash like this - `find app -type f ! -path "*models*" -exec md5sum {} \;`. FYI, the `-j0` in command `md5deep -r -l -j0 app | md5sum` stands for using 1 thread to prevent non-determinism due to individual md5sums being returned in different orders. – Sandeepan Nath Oct 25 '16 at 13:57
  • It seems, the above concern will arise if `find` uses multi threading. – Sandeepan Nath Oct 25 '16 at 13:59
  • @SandeepanNath The `find` will execute the `md5sum` command sequentially while traversing the whole directory tree. There is no multithreading involved in the command I posted – oliv Oct 25 '16 at 14:15
  • Just for the records, correcting myself, in my first comment, I meant `find app -type f ! -path "*models*" -exec md5sum {} \; | md5sum` to get the final hash. – Sandeepan Nath Oct 27 '16 at 11:31
  • I also need to be able to ignore multiple paths at a time, e.g. app/config and app/logs. Could you help with the same? – Sandeepan Nath Oct 27 '16 at 13:28
  • 1
    @SandeepanNath Just add as many `! -path "app/*"` you want to the command. – oliv Oct 27 '16 at 19:59