3

I'm trying to write a project that will have some autonomous components. One of these is the need to diff two folders and spit out the different files into an array of strings. Dircmp does part of this - it spits out the different files. But it would appear it doesn't actually go into the remaining files to see which are different when compared against the same file in a different folder.

Currently I've played with difflib and filecmp, and unless I'm doing something entirely wrong I can't find a way to achieve what I'm looking for without writing it all from scratch. The reason I need this is because this python script will be deployed on windows boxen where the standard linux diff tools will not be available.

My only other thought would be to just call diff and such from the command line, but that doesn't solve either of my problems (getting the files in an array AND not requiring GNU tools).

Can anyone help me? I'm still a total scrub at python and would really appreciate the expert advice. Thank you!

Dr.McNinja
  • 455
  • 1
  • 6
  • 15
  • I'm commenting here largely because I'm curious and want to follow this question. I looked into this similarly a few years ago when I needed linux-style tools on a windows box and couldn't install cygwin but had access to python. My particular case was find, however. I started to write a python implementation of find and got pretty far into it (I implemented name, iname, regex, iregex, and amin, cmin, and mmin). It's hard work though. If you think it would help, I can send you the code for pyfind, but it's kludgy and 2 years old. – Jonathanb Aug 18 '11 at 01:35

1 Answers1

7

It seems that filecmp.dircmp does what you want already. If you compare two directories, diff_files will be a list of files which are in both directories, but whose contents differ:

>>> dc = filecmp.dircmp('dir1', 'dir2')
>>> dc.diff_files
<<< ['foo']

As pointed out by Jonathanb, if you want actual diffs, it's easy to use difflib at this point to do so.

Community
  • 1
  • 1
Zach Kelling
  • 52,505
  • 13
  • 109
  • 108
  • I think he wants a line-by-line diff of each file which is different. However, given the list from dircmp.diff_files, it would be easy to then use difflib to diff the files and construct a list of strings. – Jonathanb Aug 18 '11 at 02:09
  • And likewise, `dc.left_only` and `dc.right_only` are the files that exist only in dir1 and dir2, respectively. Also noteworthy is that the `diff_files`, `left_only`, and `right_only` attributes only exist after running one of the `dc.report*` methods. – Marty Aug 18 '11 at 02:10
  • @Marty You don't have to run a `dc.report*` first. It's true they don't exist until you try to access them (At least since 2.6). – Zach Kelling Aug 18 '11 at 02:14
  • @zeekay thanks for the correction. That'll teach me for not looking at the source before commenting :) Very clever bit of code, that. – Marty Aug 18 '11 at 02:19
  • Very interesting, it seems this is the solution and I misinterpreted the results. Will this diff subdirectories as well? – Dr.McNinja Aug 18 '11 at 02:21
  • Agreed, nice use of `__getattr__`. Only noticed it worked that way as I was writing my answer! – Zach Kelling Aug 18 '11 at 02:23
  • @Dr.McNinja No it doesn't diff subdirectories. – Zach Kelling Aug 18 '11 at 02:25
  • You could use `dc.common_dirs` to pick out the directories in common and then `dircmp` them, or use `os.walk`. – Zach Kelling Aug 18 '11 at 02:27