2

How to make this code works? There is a zip file with folders and .png files in it. Folder ".\icons_by_year" is empty. I need to get every file one by one without unzipping it and copy to the root of the selected folder (so no extra folders made).

class ArrangerOutZip(Arranger):
    def __init__(self):
        self.base_source_folder = '\\icons.zip'
        self.base_output_folder = ".\\icons_by_year"

    def proceed(self):
        self.create_and_copy()

    def create_and_copy(self):
        reg_pattern = re.compile('.+\.\w{1,4}$')
        f = open(self.base_source_folder, 'rb')
        zfile = zipfile.ZipFile(f)
        for cont in zfile.namelist():
            if reg_pattern.match(cont):
                with zfile.open(cont) as file:
                    shutil.copyfileobj(file, self.base_output_folder)
        zfile.close()
        f.close()


arranger = ArrangerOutZip()
arranger.proceed()
Ars T.
  • 23
  • 5
  • Does this answer your question? [How to move a file?](https://stackoverflow.com/questions/8858008/how-to-move-a-file) – sahasrara62 Nov 19 '20 at 20:27
  • @sahasrara62 Unfortunately, not. I know how to copy a file between two folders. Here is the task to copy from a zip archive files one by one and paste them to the properly created folder. Folders could be created as in the parent class, but I cannot find a proper solution for copying from zip file directly bypassing unzipping. – Ars T. Nov 19 '20 at 20:35
  • try this https://stackoverflow.com/questions/42750684/how-to-move-a-zip-file-to-a-new-destination-and-then-open-it-in-python-3 – sahasrara62 Nov 19 '20 at 20:37
  • What went wrong? Is there a traceback message? – tdelaney Nov 19 '20 at 20:47
  • The problem is that `info.filename` is a path in the zipfile, not your file system. You can only copy the file while the zipfile is open. You can use `zf.extract` or even `shutil.copyfileobj(zf.open("whatever"), other_fp)` but since the path isn't on your file system, `shutil.copy2` is bound to fail. I'm not sure how to rearrange your code to do the copy inside `with open(...) as zf`, but that's what needs to happen. – tdelaney Nov 19 '20 at 20:55
  • It would help to clarify the question - you have to unzip to get the file, even if the unzip is only in memory. Are you trying to move the compressed file to the drive? This would be extremely difficult because the compression spans all of the files in the zipfile. – tdelaney Nov 19 '20 at 21:09
  • You cannot do this without unzipping the archive. – martineau Nov 19 '20 at 21:40
  • @tdelaney I came closer to the solution. Could you have a look now? I believe I am passing the wrong data as the second attribute in shutil.copyfileobj. But cannot figure out how to do this properly =( – Ars T. Nov 21 '20 at 13:10
  • You are close! `copyfileobj` needs an open file object for the destination. – tdelaney Nov 21 '20 at 17:07
  • @tdelaney Yes, I understand that. However, cannot solve how to make it. – Ars T. Nov 21 '20 at 17:33
  • i'll write up a proposed solution – tdelaney Nov 21 '20 at 17:45

2 Answers2

0

I'd take a look at the zipfile libraries' getinfo() and also ZipFile.Path() for construction since the constructor class can also use paths that way if you intend to do any creation.

Specifically PathObjects. This is able to do is to construct an object with a path in it, and it appears to be based on pathlib. Assuming you don't need to create zipfiles, you can ignore this ZipFile.Path()

However, that's not exactly what I wanted to point out. Rather consider the following:

zipfile.getinfo()

There is a person who I think is getting at this exact situation here:

https://www.programcreek.com/python/example/104991/zipfile.getinfo

This person seems to be getting a path using getinfo(). It's also clear that NOT every zipfile has the info.

Dharman
  • 30,962
  • 25
  • 85
  • 135
user1802263
  • 93
  • 1
  • 7
  • **I believe an answer somewhere around this cod but I am failing to make it works in my code:** `with ZipFile("source", "r") as zf: with zf.open("destination", "w") as entry: shutil.copyfileobj(source, destination)` – Ars T. Nov 19 '20 at 21:28
  • I changed the main question. I came closer to the solution. Could you have a look now? I believe I am passing the wrong data as the second attribute in shutil.copyfileobj. But cannot figure out how to do this properly =( – Ars T. Nov 21 '20 at 13:22
0

shutil.copyfileobj uses file objects for source and destination files. To open the destination you need to construct a file path for it. pathlib is a part of the standard python library and is a nice way to handle file paths. And ZipFile.extract does some of the work of creating intermediate output directories for you (plus sets file metadata) and can be used instead of copyfileobj.

One risk of unzipping files is that they can contain absolute or relative paths outside of the target directory you intend (e.g., "../../badvirus.exe"). extract is a bit too lax about that - putting those files in the root of the target directory - so I wrote a little something to reject the whole zip if you are being messed with.

With a few tweeks to make this a testable program,

from pathlib import Path
import re
import zipfile
#import shutil

#class ArrangerOutZip(Arranger):
class ArrangerOutZip:
    def __init__(self, base_source_folder, base_output_folder):
        self.base_source_folder = Path(base_source_folder).resolve(strict=True)
        self.base_output_folder = Path(base_output_folder).resolve()

    def proceed(self):
        self.create_and_copy()

    def create_and_copy(self):
        """Unzip files matching pattern to base_output_folder, raising
        ValueError if any resulting paths are outside of that folder.
        Output folder created if it does not exist."""
        reg_pattern = re.compile('.+\.\w{1,4}$')
        with open(self.base_source_folder, 'rb') as f:
            with zipfile.ZipFile(f) as zfile:
                wanted_files = [cont for cont in zfile.namelist()
                    if reg_pattern.match(cont)]
                rebased_files = self._rebase_paths(wanted_files,
                    self.base_output_folder)
                for cont, rebased in zip(wanted_files, rebased_files):
                    print(cont, rebased, rebased.parent)
                    # option 1: use shutil
                    #rebased.parent.mkdir(parents=True, exist_ok=True)
                    #with zfile.open(cont) as file, open(rebased, 'wb') as outfile:
                    #    shutil.copyfileobj(file, outfile)
                    # option 2: zipfile does the work for you
                    zfile.extract(cont, self.base_output_folder)
                    
    @staticmethod
    def _rebase_paths(pathlist, target_dir):
        """Rebase relative file paths to target directory, raising
        ValueError if any resulting paths are not within target_dir"""
        target = Path(target_dir).resolve()
        newpaths = []
        for path in pathlist:
            newpath = target.joinpath(path).resolve()
            newpath.relative_to(target) # raises ValueError if not subpath
            newpaths.append(newpath)
        return newpaths


#arranger = ArrangerOutZip('\\icons.zip', '.\\icons_by_year')

import sys
try:
    arranger = ArrangerOutZip(sys.argv[1], sys.argv[2])
    arranger.proceed()
except IndexError:
    print("usage: test.py zipfile targetdir")
tdelaney
  • 73,364
  • 6
  • 83
  • 116