38

I have moved a web site from one server to another and I copied the files using SCP. I now wish to check that all the files have been copied OK. How do I compare the sites?

Count files for a folder? Get the total files size for folder tree? Or is there a better way to compare the sites?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Pbearne
  • 1,025
  • 3
  • 12
  • 25

11 Answers11

80

Using diff with the recursive -r and quick -q option. It is the best and by far the fastest way to do this.

diff -r -q /path/to/dir1 /path/to/dir2

It won't tell you what the differences are (remove the -q option to see that), but it will very quickly tell you if all the files are the same.

If it shows no output, all the files are the same, otherwise it will list the files that are different.

phayes
  • 1,432
  • 1
  • 12
  • 11
17

If you were using scp, you could probably have used rsync.

rsync won't transfer files that are already up to date, so you can use it to verify a copy is current by simply running rsync again.

If you were doing something like this on the old host:

scp -r from/my/dir newhost:/to/new/dir

Then you could do something like

rsync -a --progress from/my/dir newhost:/to/new/dir

The '-a' is short for 'archive' which does a recursive copy and preserves permissions, ownerships etc. Check the man page for more info, as it can do a lot of clever things.

Paul Dixon
  • 295,876
  • 54
  • 310
  • 348
  • For more on `rsync` usage, here are a couple of my answers with my favorite `rsync` commands: 1. [Ask Ubuntu: How to show the transfer progress and speed when copying files with cp?](https://askubuntu.com/a/1275972/327339) and 2. [Super User: rsync Command-line tool (Linux, Windows with Cygwin) (AKA: "How to use rsync")](https://superuser.com/a/1464264/425838) – Gabriel Staples Feb 26 '22 at 16:35
11
cd website
find . -type f -print | sort | xargs sha1sum

will produce a list of checksums for the files. You can then diff those to see if there are any missing/added/different files.

Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
5

maybe you can use something similar to this:

find <original root dir> | xargs md5sum  > original
find <new root dir> | xargs md5sum  > new
diff original new
Eineki
  • 14,773
  • 6
  • 50
  • 59
3

To add on reply from Sidney. It is not very necessary to filter out -type f, and produce hash code. In reply to zidarsk8, you don't need to sort, since find, same as ls, sorts the filenames alphabetically by default. It works for empty directories as well.

To summarize, top 3 best answers would be: (P.S. Nice to do a dry run with rsync)

diff -r -q /path/to/dir1 /path/to/dir2

diff <(cd dir1 && find) <(cd dir2 && find)

rsync --dry-run -avh from/my/dir newhost:/to/new/dir
luz
  • 31
  • 1
1

Make checksums for all files, for example using md5sum. If they're all the same for all the files and no file is missing, everything's OK.

schnaader
  • 49,103
  • 10
  • 104
  • 136
1

If you used scp, you probably can also use rsync over ssh.

rsync -avH --delete-after 1.example.com:/path/to/your/dir 2.example.com:/path/to/your/

rsync does the checksums for you.

Be sure to use the -n option to perform a dry-run. Check the manual page.

I prefer rsync over scp or even local cp, every time I can use it.

If rsync is not an option, md5sum can generate md5 digests and md5sumc --check will check them.

Giacomo
  • 11,087
  • 5
  • 25
  • 25
1

...when comparing two folders across a network drive or on separate computers

If comparing two folders on the same computer, diff is fine, as explained by the main answer.

However, if trying to compare two folders on different computers, or across a network, don't do that! If across a network, it will take forever since it has to actually transmit every byte of every file in the folder across the network. So, if you are comparing a 3 GB dir, all 3 GB have to be transferred across the network just to see if the remote dir and local dir are the same.

Instead, use a SHA256 hash. Hash the dir on one computer on that computer, and on the other computer on that computer. Here is how:

(From my answer here: How to hash all files in an entire directory, including the filenames as well as their contents):

# 1. First, cd to the dir in which the dir of interest is found. This is
# important! If you don't do this, then the paths output by find will differ
# between the two computers since the absolute paths to `mydir` differ. We are
# going to hash the paths too, not just the file contents, so this matters. 
cd /home/gabriel            # example on computer 1
cd /home/gabriel/dev/repos  # example on computer 2

# 2. hash all files inside `mydir`, then hash the list of all hashes and their
# respective file paths. This obtains one single final hash. Sorting is
# necessary by piping to `sort` to ensure we get a consistent file order in
# order to ensure a consistent final hash result. Piping to awk extracts 
# just the hash.
find mydir -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'

Example run and doutput:

$ find eclipse-workspace -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
8f493478e7bb77f1d025cba31068c1f1c8e1eab436f8a3cf79d6e60abe2cd2e4

Do this on each computer, then ensure the hashes are the same to know if the directories are the same.

Note that the above commands ignore empty directories, file permissions, timestamps of when files were last edited, etc. For most cases though that's ok.

You can also use rsync to basically do this same thing for you, even when copying or comparing across a network.

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
0

I would add this to Douglas Leeder or Eineki, but sadly, don't have enough reputation to comment. Anyway, their answers are both great, excepting that they don't work for file names with spaces. To make that work, do

find [dir1] -type f -print0 | xargs -0 [preferred hash function] > [file1]

find [dir2] -type f -print0 | xargs -0 [preferred hash function] > [file2]

diff -y [file1] [file2]

Just from experimenting, I also like to use the -W ### arguement on diff and output it to a file, easier to parse and understand in the terminal.

Sidney
  • 624
  • 7
  • 20
0

Try diffing your directory recursively. You'll get a nice summary if something is different in one of the directories.

driAn
  • 3,245
  • 4
  • 41
  • 57
0

I have been move a web site from one server to another I copied the files using SCP

You could do this with rsync, it is great if you just want to mirror something.

/Johan

Update : Seems like @rjack beat me with the rsync answer with 6 seconds :-)

Johan
  • 20,067
  • 28
  • 92
  • 110