0

I'm new in pyspark, I write a python code for reading csv as rdd. But I concurred UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 1055: ordinal not in range(128).

Here is the solution that I tried but doesn't work: PySpark — UnicodeEncodeError: 'ascii' codec can't encode character

import pandas as pd
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *


spark=SparkSession.builder.appName("hj").getOrCreate()
sc = SparkContext.getOrCreate()

lines = sc.textFile('/hello.csv')


lines = lines.filter(lambda row:row != header)
header = lines.first()
print(header)

I've typed in "export PYTHONIOENCODING=utf8" before spark submit, but it didn't work. Can anyone help me? thank you very much!

chloe
  • 31
  • 7
  • Could you once try with this code? `import sys reload(sys) sys.setdefaultencoding('utf-8')` – notNull May 20 '19 at 03:03
  • I tried this, but get "AttributeError: module 'sys' has no attribute 'setdefaultencoding', in python2.7 – chloe May 20 '19 at 03:26

0 Answers0