1

I am writing something like auto-compress files and upload to server, and I'm using writestr to directly write bytes to ZipFile from memory. But for historical problems, I need to make it display properly on some Windows PC with GBK encoding.

Python3's str encoding is default utf-8. I want to know how to write the correct byte stream for the filename, the content can work with utf-8 so no need to care file content.

I need a code sample. Thanks. Any solution is acceptable like new a class inherit it.

tripleee
  • 175,061
  • 34
  • 275
  • 318
Notealot
  • 23
  • 5
  • The [`zipfile` documentation](https://docs.python.org/3/library/zipfile.html) includes this note: *"The ZIP file standard historically did not specify a metadata encoding, but strongly recommended CP437 (the original IBM PC encoding) for interoperability. Recent versions allow use of UTF-8 (only). In this module, UTF-8 will automatically be used to write the member names if they contain any non-ASCII characters. It is not possible to write member names in any encoding other than ASCII or UTF-8."* Maybe explore third-party modules with a different feature set. – tripleee Jan 08 '23 at 10:21
  • The [`_encodeFilenameFlags` method](https://github.com/python/cpython/blob/be7c19723fa3fea3a4efe5c9c795c1e4a2fc05f5/Lib/zipfile.py#L481) looks like it could be overridden by one which uses a different encoding. – tripleee Jan 08 '23 at 11:39

1 Answers1

1

Thanks to @tripleee.

Override _encodeFilenameFlags method works:

class GBKZipInfo(zipfile.ZipInfo):
    # override _encodeFilenameFlags method, change encoding to GBK
    def _encodeFilenameFlags(self):
        try:
            return self.filename.encode('gbk'), self.flag_bits
        except UnicodeEncodeError:
            return self.filename.encode('utf-8'), self.flag_bits | 0x800
Notealot
  • 23
  • 5