I am reading some files from google cloud storage using python
spark = SparkSession.builder.appName('aggs').getOrCreate()
df = spark.read.option("sep","\t").option("encoding", "UTF-8").csv('gs://path/', inferSchema=True, header=True,encoding='utf-8')
df.count()
df.show(10)
However, I keep getting an error that complains about the df.show(10)
line:
df.show(10)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line
350, in show
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 162: ordinal not in range(128)
I googled and found this seems to be a common error and the solution should be added in the encoding of "UTF-8"
to the spark.read.option
, as I already did. Since this doesn't help, I am still getting this error, could experts help? Thanks in advance.