I met with the problem of parsing the filename encoded in html url encoding.
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename*0*=utf-8''%48%61%72%6D%6F%6E%6F%67%72%61%6D;
filename*1*=%32%30%31%38%20%C5%81%75%6B%61%73%7A%65%77;
filename*2*=%61%20%33%35%2E%70%64%66
After get_filename it returns already filename decoded with extended ASCII characters. In that case decode_header can not deal with anymore. Here is exception:
File "/usr/lib/python2.7/email/header.py", line 73, in decode_header
header = str(header)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0141' in position 26: ordinal not in range(128)
Here is a piece of code to getting the filename:
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
content = part.get_payload(decode=True)
if content:
filename = part.get_filename()
if filename:
filename = decode_header(filename)
[...]
All other normal encoding seems to work. What is the problem, sometimes get_filename return decoded string or encoded. Could you advise me how can I resolve this?
Exact content when I get UnicodeError is "Harmonogram 2018 Łukaszewa 35.pdf"
Second message contains and it's working:
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="To =?UTF-8?b?Y8OzxbwsIMW8ZQ==?= ze Szwecji, to nic,
=?UTF-8?b?xbxl?= ze Szwecji..xlsx"