Solution:
I ran into the same issue, and I think that, more than a bug in the pygsheets module, it would be a limitation as you clearly point out.
What I did to solve this issue was:
def encodeDataFrame(df, encoding='UTF-8'):
if df is not None and not df.empty:
for column, series in df.items():
if type(series.at[0]) == unicode:
try:
encodedSeries = series.str.encode(encoding)
df[column] = encodedSeries
except Exception as e:
print 'Could not encode column %s' % column
raise
And you can call the function this way:
encodeDataFrame(df)
wks.set_dataframe(df, start='A10')
This might no longer be a good solution because of a change that was made in pygsheets
to avoid this issue. See EDIT section below
Explanation:
You solve the issue by encoding the unicode
values yourself, before sending them to the set_dataframe
function.
This problem comes up whenever you try to use the Worksheet.set_dataframe
function, using a dataframe that contains unicode characters that cannot be encoded in ascii (like accents, and many other).
The exception is thrown because the set_dataframe
function attempts to cast the unicode
values into str
values (using the default encoding). For Python 2, the default encoding is ascii
and when a character out of the range of ascii
is found, the exception is thrown.
Some people have suggested reloading the sys module to circumvent this problem, but here is explained why you should not do it
The other solution I would think of would be to use the pygsheets module in Python 3, where this should no longer be a problem because the default encoding for Python 3 is UTF-8 (see docs)
Bonus:
Ask yourself:
1) Is Unicode an encoding?
2) What is an encoding?
If you hesitated with any of those questions, you should read this article, which gave me the knowledge needed to think of this solution. The time invested was completely worth it.
For more information, you can try this article which links to the previous one at the end.
EDIT:
A change was made 2 days ago (07/26/19) to pygsheets
that is supposed to fix this. It looks like the intention is to avoid encoding into the str
type, but I would think that this change might try decoding strings into the unicode
type from the default ascii encoding, which could also lead to trouble. When/If this change is released it is probably going to be better not to encode anything and pass values as unicode to the set_dataframe
function.
EDIT 2:
This change has now been released in version 2.0.2
. If my predictions were true, using the encodeDataFrame
function I suggested will result in a UnicodeDecodeError
because the Worksheet.set_dataframe
function will attempt to decode the str
values with the default ascii
encoding. So the best way to use the function will be not to have any str
values in your dataframe. If you have them, decode them to unicode
before calling the set_dataframe
function. You could have a mirror version of the
function I suggested. It would look something like this:
def decodeDataFrame(df, encoding='UTF-8'):
if df is not None and not df.empty:
for column, series in df.items():
if type(series.at[0]) == str:
try:
decodedSeries = series.str.decode(encoding)
df[column] = decodedSeries
except Exception as e:
print 'Could not decode column %s' % column
raise