I have 2 paths where each one them has multi-volume .7Z archive files.
Path A contains 4 files: example1.7z.001, example1.7z.002, example1.7z.003, example1.7z.004
(Total size of all is 15 GB). Once you extract you get one 7z file of 20GB, and once your extract that one, you get folder of 40 GB. Inside there is a folder called TEST1
which takes 5 GB.
Path B contains 5 files: example2.7z.001, example2.7z.002, example2.7z.003, example2.7z.004, example2.7z.005
(Total size of all is 20 GB). Once you extract you get one 7z file of 22GB, and once your extract that one, you get folder of 50 GB. Now, the folder called TEST1
increased to 7 GB, and there is also new folder calls TEST2
which takes 1.2 GB.
I want to write python script which get these 2 paths as input, and as output prints me the existing and new files/folders which increase (in case of existing) or take (in case of new) more than 1 GB. In my example it should return TEST1
and TEST2
.
From short research, I got these ideas:
Using magic:
import magic
def compare_7zip_content(file1, file2):
with open(file1, 'rb') as f1, open(file2, 'rb') as f2:
m1 = magic.Magic(mime=True)
m2 = magic.Magic(mime=True)
file1_content = f1.read()
file2_content = f2.read()
file1_type = m1.from_buffer(file1_content)
file2_type = m2.from_buffer(file2_content)
if file1_type == 'application/x-7z-compressed' and file2_type == 'application/x-7z-compressed':
if file1_content == file2_content:
return True
else:
return False
else:
raise ValueError('One or both of the files are not 7zip format')
Using py7zr
import py7zr
import os
folder1 = '/path/to/folder1'
folder2 = '/path/to/folder2'
for filename in os.listdir(folder1):
if filename.endswith('.7z'):
file1 = os.path.join(folder1, filename)
file2 = os.path.join(folder2, filename)
with py7zr.SevenZipFile(file1, mode='r') as archive1, \
py7zr.SevenZipFile(file2, mode='r') as archive2:
archive1_files = archive1.getnames()
archive2_files = archive2.getnames()
for archive1_file, archive2_file in zip(archive1_files, archive2_files):
size_diff = abs(archive1.getmember(archive1_file).file_size -
archive2.getmember(archive2_file).file_size)
if size_diff >= 1000000000: # 1 GB in bytes
print(f"{filename}: {size_diff / 1000000000} GB")
Another option is use directly the 7z CLI
Can you recommend on which one I should use (or other idea)?