1

I'm having a problem with uploading python pandas df to bigquery using the df.to_gbq function.The issue arises from non latin (and non ascii) chars in the df. when I try to upload the df I get this error:

'latin-1' codec can't encode characters in position 126-129: Body ('עמית') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

even if the df encoding is utf-8, suggesting that the function converts to latin-1 prior to uploading to BQ and therefore can't handle any non -latin chars. I couldn't find any argument of the function that will allow me to change the encoding. I also tried to upload the same data from csv file and a utf-8 csv could be uploaded directly, but not when I first uploaded the data to pandas DF (using read_csv func) and then to BQ.

Any suggestions how to correctly upload this df to BQ ? Any workarounds (I just need it uploaded, not necessarily using pandas to_gbq)? thanks

Amit Sadeh
  • 41
  • 4
  • What do you mean when you say "suggesting that the function converts to latin-1 prior to uploading to BQ and therefore can't handle any non -latin chars"? Have you tried changing the encoding line by line? nonetheless I think converting the whole document seems far more optimal. There is an Stackoverflow [post](https://stackoverflow.com/questions/4182603/how-to-convert-a-string-to-utf-8-in-python) which does explain how to do so. – Ggrimaldo Jun 04 '18 at 08:04
  • The dataframe is already in utf-8, I think that the pandas to_gbq function for some reason encode the data as 'latin-1' and then tries to send it , thus causing the error mentioned. I also tried to re -encode it as utf-8 using .str.encode(encoding='utf-8',errors='strict') but I still get the same error – Amit Sadeh Jun 04 '18 at 10:55
  • What version of python are you using? Also, if you have example of the code you are running it might be helpful as well – Willian Fuks Jun 05 '18 at 01:59
  • thank you guys, but I eventually found that this bug fix was already done and merged on version 0.3.1 of pandas-gbq (https://github.com/pydata/pandas-gbq/issues/106), turns out all I needed to do is update version. – Amit Sadeh Jun 05 '18 at 05:15

0 Answers0