This follows on from my question yesterday: Finding duplicate files via hashlib?
I now realize I need to group the files into filesize. So let's say I had 10 files in a folder, but 3 of them were 50 bytes each, I would group the 3 files.
I've found that I can find the size in bytes of a file by using:
print os.stat(/Users/simon/Desktop/file1.txt).st_size
or:
print os.path.getsize(/Users/simon/Desktop/file1.txt)
Which is great. But how would I scan a folder using os.walk and list a group of files together using one of the methods above??
After that, I want to hash them via hashlib's MD5 to find duplicates.