'UnicodeEncodeError' when using 'set_dataframe'

Question

When using set_dataframe to update my Google Sheets via pygsheets and pandas, I get error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 13: ordinal not in range(128)

This is due to utf-8 marks over some text, e.g.,: "señor"

This happens on executing:

wks.set_dataframe(df, start='A10')

Pandas to_csv accepts an encoding parameter similar to encoding="utf-8", may I suggests set_dataframe does the same?

wks.set_dataframe(df, start='A10', encoding="utf-8")

I see there's a ticket opened 10 days ago here but is there a workaround?

alejandro · Answer 1 · 2019-08-10T02:25:22.977

Solution:

I ran into the same issue, and I think that, more than a bug in the pygsheets module, it would be a limitation as you clearly point out.

What I did to solve this issue was:

def encodeDataFrame(df, encoding='UTF-8'):
  if df is not None and not df.empty:
    for column, series in df.items():
      if type(series.at[0]) == unicode:
        try:
          encodedSeries = series.str.encode(encoding)
          df[column] = encodedSeries
        except Exception as e:
          print 'Could not encode column %s' % column
          raise

And you can call the function this way:

encodeDataFrame(df)
wks.set_dataframe(df, start='A10')

This might no longer be a good solution because of a change that was made in pygsheets to avoid this issue. See EDIT section below

Explanation:

You solve the issue by encoding the unicode values yourself, before sending them to the set_dataframe function.

This problem comes up whenever you try to use the Worksheet.set_dataframe function, using a dataframe that contains unicode characters that cannot be encoded in ascii (like accents, and many other).

The exception is thrown because the set_dataframe function attempts to cast the unicode values into str values (using the default encoding). For Python 2, the default encoding is ascii and when a character out of the range of ascii is found, the exception is thrown.

Some people have suggested reloading the sys module to circumvent this problem, but here is explained why you should not do it

The other solution I would think of would be to use the pygsheets module in Python 3, where this should no longer be a problem because the default encoding for Python 3 is UTF-8 (see docs)

Bonus:

Ask yourself:
1) Is Unicode an encoding?
2) What is an encoding?

If you hesitated with any of those questions, you should read this article, which gave me the knowledge needed to think of this solution. The time invested was completely worth it.

For more information, you can try this article which links to the previous one at the end.

EDIT:

A change was made 2 days ago (07/26/19) to pygsheets that is supposed to fix this. It looks like the intention is to avoid encoding into the str type, but I would think that this change might try decoding strings into the unicode type from the default ascii encoding, which could also lead to trouble. When/If this change is released it is probably going to be better not to encode anything and pass values as unicode to the set_dataframe function.

EDIT 2:

This change has now been released in version 2.0.2. If my predictions were true, using the encodeDataFrame function I suggested will result in a UnicodeDecodeError because the Worksheet.set_dataframe function will attempt to decode the str values with the default ascii encoding. So the best way to use the function will be not to have any str values in your dataframe. If you have them, decode them to unicode before calling the set_dataframe function. You could have a mirror version of the function I suggested. It would look something like this:

def decodeDataFrame(df, encoding='UTF-8'):
  if df is not None and not df.empty:
    for column, series in df.items():
      if type(series.at[0]) == str:
        try:
          decodedSeries = series.str.decode(encoding)
          df[column] = decodedSeries
        except Exception as e:
          print 'Could not decode column %s' % column
          raise

'UnicodeEncodeError' when using 'set_dataframe'

1 Answers1