How to export DataFrame to Html with utf-8 encoding?

Question

I keep getting:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 265-266: ordinal not in range(128)

when I try:

df.to_html("mypage.html")

here is a sample of how to reproduce the problem:

df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})
df.to_html("mypage.html")

the list of elements in "a" are of type "unicode".

when I want to export it to csv it works because you can do:

df.to_csv("myfile.csv", encoding="utf-8")

This works fine on python 3, presumably you're using python 2? — EdChum, Mar 09 '16 at 14:45
@EdChum I've personally run into this on Python2.7 numerous times. — Ami Tavory, Mar 09 '16 at 14:46
@YOBA HTML generation is neither fast nor vectorize-able. I usually just iterate over records, and use something like [this](https://pypi.python.org/pypi/html). — Ami Tavory, Mar 09 '16 at 14:48

score 8 · Answer 1 · answered Aug 30 '19 at 00:07

8

The way it worked for me:

html = df.to_html()

with open("dataframe.html", "w", encoding="utf-8") as file:
    file.writelines('<meta charset="UTF-8">\n')
    file.write(html)

answered Aug 30 '19 at 00:07

Vladimir Kulyashov

356
2
4

score 3 · Accepted Answer · answered Mar 09 '16 at 16:43

3

Your problem is in other code. Your sample code has a Unicode string that has been mis-decoded as latin1, Windows-1252, or similar, since it has UTF-8 sequences in it. Here I undo the bad decoding and redecode as UTF-8, but you'll want to find where the wrong decode is being performed:

>>> s = u'Rue du Gu\xc3\xa9, 78120 Sonchamp'
>>> s.encode('latin1').decode('utf8')
u'Rue du Gu\xe9, 78120 Sonchamp'
>>> print(s.encode('latin1').decode('utf8'))
Rue du Gué, 78120 Sonchamp

answered Mar 09 '16 at 16:43

Mark Tolonen

166,664
26
169
251

thanks but it doesn't solve my problem, still getting the same error even though I applied it to all the the elements of "a" in the dataframe: df["a"] = df.apply( lambda x: x["a"].encode('latin1').decode('utf8'), axis=1) – YOBA Mar 10 '16 at 13:31
@YOBA are you sure the strings are supposed to be Unicode? The error you are getting is typical of using `.decode` on an already decoded Unicode string in Python 2.x. – Mark Tolonen Mar 10 '16 at 15:33

Prof · Answer 3 · 2019-03-10T18:29:58.520

The issue is actually in using df.to_html("mypage.html") to save the HTML to a file directly. If instead you write the file yourself, you can avoid this encoding bug with pandas.

html = df.to_html()
with open("mypage.html", "w", encoding="utf-8") as file:
    file.write(html)

You may also need to specify the character set in the head of the HTML for it to show up properly on certain browsers (HTML5 has UTF-8 as default):

<meta charset="UTF-8">

This was the only method that worked for me out of the several I've seen.

score 1 · Answer 4 · answered Oct 24 '17 at 08:58

If you really need to keep the output to html, you could try cleaning the code in a numpy array before writing to_html.

df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})

def clean_unicode(df):
   *#Transforms the DataFrame to Numpy array*
   df=df.as_matrix()
   *#Encode all strings with special characters* 
   for x in np.nditer(df, flags=['refs_ok'], op_flags =['copy', 'readonly']):
         df[df==x]=str(str(x).encode("latin-1", "replace").decode('utf8'))
   *#Transform the Numpy array to Dataframe again*
   df=pd.DataFrame(df)
   return df

df=clean_unicode(df)
df.to_html("Results.html') -----> Success!

How to export DataFrame to Html with utf-8 encoding?

4 Answers4

Linked