I created a Python script to read a email file using "email" module and extract its attachments to file system, zip the extracted files and email the Zip file to someone.
The attachments may have Unicode file name such as Chinese or Japanese. I found the the module "email.header.decode_header" can retrieve the file name and its encoding. For example:
decode_header(payload.get_filename())
will produce:
[('2015\xe5\xb9\xb4\xe6\xb5\x81\xe5\xb9\xb4_Test.pages', 'utf-8')]
which filename is encoded by UTF-8. Or
[('\x1b$B%Q%=%3%s;q;:4IM}BfD"!J\x1b(BS&T HK\x1b$B!K\x1b(B_\x1b$B8=COD4C#\x1b(BPC.xls', 'iso-2022-jp')]
contains Japanese Characters.
In the script I convert the file name to UTF-8 and saved in file system (Linux) then create a Zip file then send the Zip file via email. When user retrieve and extract the zip file in Windows, the file names in the Zip file changed to rubbish.
I search the Google and StackOverflow I found that Windows file system is Unicode instead of UTF-8. So I can open the Zip file without problem on MacOS but problem on Windows. I also try to change the script to name the file in Unicode format:
filename = unicode('\x1b$B%Q%=%3%s;q;:4IM}BfD"!J\x1b(BS&T HK\x1b$B!K\x1b(B_\x1b$B8=COD4C#\x1b(BPC.xls', 'iso-2022-jp')
f = open(filename, 'wb')
....
I can create a file without problem when I try the above commands in Python shell. However, when I put the exact command into my script, an error
UnicodeEncodeError: 'ascii' codec can't encode characters in position 4-6: ordinal not in range(128)
displayed.
Does anyone can suggest me how to solve this problem so that I can create a Zip file which can open in Windows with correct names.