UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' (writing to PDF)

Question

I am having an issue with Unicode with a variable contents when writing to a .pdf with python.

It's outputting this error:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013'

Which is it getting caught on an em dash basically.

I have tried taking that variable, where the contents has an 'em dash' and redefined it with an '.encode('utf-8')' for example, i.e., below:

Body = msg.Body

BodyC = Body.encode('utf-8')

And now I get the below error:

Traceback (most recent call last):
  File "script.py", line 37, in <module>
    pdf.cell(200, 10, txt="Bod: " + BodyC,  ln=4, align="C")
TypeError: can only concatenate str (not "bytes") to str

Below is my full code, how could I simply fix my Unicode error in 'Body' variable contents.

Converting to utf-8 or western, anything outside of 'latin-1'. Any suggestions?

Full Code:

from fpdf import FPDF
import win32com.client

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(r"C:\User\language\python\Msg-To-PDF\test_msg.msg")

print (msg.SenderName)
print (msg.SenderEmailAddress)
print (msg.SentOn)
print (msg.To)
print (msg.CC)
print (msg.BCC)
print (msg.Subject)
print (msg.Body)

SenderName = msg.SenderName
SenderEmailAddress = msg.SenderEmailAddress
SentOn = msg.SentOn
To = msg.To
CC = msg.CC
BCC = msg.BCC
Subject = msg.Subject
Body = msg.Body
BodyC = Body.encode('utf-8')

pdf = FPDF()
pdf.add_page()

# pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)
pdf.set_font("Helvetica", style = '', size = 11)
pdf.cell(200, 10, txt="From: " + SenderName, ln=1, align="C")
# pdf.cell(200, 10, border=SentOn, ln=1, align="C")
pdf.cell(200, 10, txt="To: " + To, ln=1, align="C")
pdf.cell(200, 10, txt="CC: " + CC, ln=1, align="C")
pdf.cell(200, 10, txt="BCC: " + BCC, ln=1, align="C")
pdf.cell(200, 10, txt="Subject: " + Subject, ln=1, align="C")
pdf.cell(200, 10, txt="Bod: " + BodyC,  ln=4, align="C")

pdf.output("Sample.pdf")

How can I change out of 'latin1'?

Anyway to just globally fix these issues?

It still produces the exact same error 'UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 485: ordinal not in range(256)' — Dr Upvote, Jun 25 '19 at 20:26
Try this answer: https://stackoverflow.com/questions/6539881/python-converting-from-iso-8859-1-latin1-to-utf-8 — ladygremlin, Jun 25 '19 at 20:33
`BodyC = Body.encode('utf-8')` actually does nothing! Another point is `\u2013` error output is `unicode` but system-wide encoding not set properly. Some warnings: User_class which sub_process is calling with default encoding ? most encoding errors throw by from nonRAW file/IO objects. @ladygremlin whindows always excepting this errors, I solved the system-wide encoding by UTF-8 (not Unicode). — dsgdfg, Jun 25 '19 at 20:37
@dsgdfg Ahhh, I didn't realize Windows always throws this. That's not my OS of choice. :) — ladygremlin, Jun 25 '19 at 20:38
on python idle `'\x64\x45'+'teest' = 'dEteest'` mean i used `python2.7.X` so if use `python3.x` convert bytes to string with source encoding. — dsgdfg, Jun 25 '19 at 20:46
Possible duplicate of [Python : UnicodeEncodeError: 'latin-1' codec can't encode character](https://stackoverflow.com/questions/8290206/python-unicodeencodeerror-latin-1-codec-cant-encode-character) — phuclv, Jun 27 '19 at 00:18
[UnicodeEncodeError: 'latin-1' codec can't encode character](https://stackoverflow.com/q/3942888/995714) — phuclv, Jun 27 '19 at 00:18
@phuclv so I fixed this specific error; but how can I globally handle these issues? — Dr Upvote, Jun 27 '19 at 13:13

score 21 · Accepted Answer · answered Jul 04 '19 at 22:07

21

A workaround is to convert all text to latin-1 encoding before passing it on to the library. You can do that with the following command:

text2 = text.encode('latin-1', 'replace').decode('latin-1')

text2 will be free of any non-latin-1 characters. However, some chars may be replaced with ?

answered Jul 04 '19 at 22:07

Erik Kalkoken

30,467
8
79
114

does this work in Python 3...I'm hving issues gettiing thiss to work. I can converrt it to a string wtih ? however the fpdf still rejects it... – BostonMacOSX Nov 03 '20 at 20:39
Yes, I ran this with Python 3 too – Erik Kalkoken Nov 04 '20 at 02:19
My ' are all coming out at ?....have you used the font subsitution method where you define a UTF8 font? – BostonMacOSX Nov 04 '20 at 12:45
1

This is the solution. To add in this answer you can ignore it as well using: text.encode('latin-1', 'ignore').decode('latin-1') – Akshay Jan 23 '23 at 21:51

score 5 · Answer 2 · answered Aug 05 '19 at 14:04

The reason for this error is that you are trying to render a character in your PDF that is outside the code range of latin-1 encoding. FPDF uses latin-1 as default encoding for all its build-in fonts.

So as a workaround you can just remove all characters from your text that do not fit into latin-1 encoding. (see my other answer for this workaround).

To fix this error and be able to render those characters in your PDF you need to use fonts that support a wider range of characters. To address this the FPDF library supports Unicode font.

For example you could get the free Google Noto fonts, which support a wide range of Unicode endpoints. For most western languages I would recommend the NotoSans font set. But you can also get fonts for many other languages and scripts including Chinese, Hebrew or Arabic.

Here is how to enable the Unicode fonts in your code for FPDF:

First you need to tell FPDF library where it can find the font files. In this example I am setting it to the sub-folder fonts of the current folder.

import fpdf
fpdf.set_global("SYSTEM_TTFONTS", os.path.join(os.path.dirname(__file__),'fonts'))

Then you need to add the fonts to your PDF document. In this example I am adding the NotoSans fonts for the styles normal, bold, italic and bold-italic:

pdf = fpdf.FPDF()
pdf.add_font("NotoSans", style="", fname="NotoSans-Regular.ttf", uni=True)
pdf.add_font("NotoSans", style="B", fname="NotoSans-Bold.ttf", uni=True)
pdf.add_font("NotoSans", style="I", fname="NotoSans-Italic.ttf", uni=True)
pdf.add_font("NotoSans", style="BI", fname="NotoSans-BoldItalic.ttf", uni=True)

Now you can use the new fonts normally in your PDF document with set_font(). Here is an example for normal text:

pdf.set_font("NotoSans", size=12)

I tried this solution and got this error ```AttributeError: module 'fpdf' has no attribute 'set_global'``` any specific version of fpdf recommended. It gives the error at ```fpdf.set_global...```. I skipped the set_global and gave relative path in ```pdf.add_font(..``` and it works ```pdf.add_font("NotoKufiArabic", style="", fname="./fonts/NotoKufiArabic-Regular.ttf", uni=True)``` — akarahman, Sep 08 '21 at 07:01

score 1 · Answer 3 · answered Jun 11 '20 at 18:07

You can also change the encoding through the .set_doc_option() method (documentation here). I tried Erik's method, which worked for me, but then after adding some more complexities (such as a second PDF and using the write_html() method which required creating a new class), I went back to having the same error. Changing the encoding for the whole document should solve the overall problem as you said.

The readthedocs page says you can only use latin-1 or windows-1252, but pdf.set_doc_option('core_fonts_encoding', 'utf-8') worked for me according to the debugger. Just be aware that some characters will need fixing, like the apostrophe (') showing as Ã¢Â€ÂTM in the PDF.

Hope this is the global fix for this issue you were looking for, even if several months late!

not working, 'FPDF' object has no attribute 'set_doc_option' — Carlost, Nov 04 '22 at 16:44

score 0 · Answer 4 · answered Sep 08 '21 at 07:14

I was trying Erik's solution with some changes, works great with a mix of English and Arabic text. Sample code posted below to generate PDF using pyFPDF.

from datetime import datetime
def getFileName():
    now=datetime.now()
    time = now.strftime('%d_%H_%M_%S')
    filename = "Test_"+time + ".pdf"
    return filename


from fpdf import FPDF

pdf = FPDF()

#Download NotoSansArabic-Regular.ttf from Google noto fonts
pdf.add_font("NotoSansArabic", style="", fname="./fonts/NotoSansArabic-Regular.ttf", uni=True)


pdf.add_page()

pdf.set_font('Arial', '', 12)
pdf.write(8, 'Hello World')
pdf.ln(8)

# مرحبا Marhaba in arabic 
pdf.set_font('NotoSansArabic', '', 12)
text = 'مرحبا'
pdf.write(8, text)
pdf.ln(8)

pdf.output(getFileName(), 'F')

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' (writing to PDF)

4 Answers4

Linked