I have a Spark Dataframe that must be saved in PostgreSQL. I think I have the appropriate Python sentence except for the encoding options, since I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 95: ordinal not in range(128)
My current sentence is as:
df.write.jdbc(url=jdbc_url, table='{}.{}'.format(schema_name, table_name), mode='overwrite', properties=properties)
It seems by default Pyspark is trying to encode the dataframe as ASCII, thus I should specify the correct encoding (UTF-8). How to do that?
I've tried with option("charset", "utf-8")
, option("encoding", "utf-8")
and many other combinations I've seen in the Internet. I've also tried to add "client_encoding":"utf8"
in the properties passed to jdbc
. But nothing seems to work.
Any help would be really appreciated.
Additional info:
- Python 2.7
- Spark 1.6.2
EDIT 1
My database is UTF-8 encoded:
$ sudo -u postgres psql db_test -c 'SHOW SERVER_ENCODING'
server_encoding
-----------------
UTF8
(1 row)
EDIT 2
I noticed together with this error another one was hidden in the logs: the PostgreSQL driver was complaining about the table I wanted to create was already created! Thus, I removed it from PostgreSQL and everything went like a charm :) Unfortunately, I was not able to completely understand how one thing was related to the other... Maybe because the table that was already created used ASCII encoding and there was some kind of incompatibility among it and the data that was intended to be saved?