I'm new in pyspark, I write a python code for reading csv as rdd. But I concurred UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 1055: ordinal not in range(128).
Here is the solution that I tried but doesn't work: PySpark — UnicodeEncodeError: 'ascii' codec can't encode character
import pandas as pd
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
spark=SparkSession.builder.appName("hj").getOrCreate()
sc = SparkContext.getOrCreate()
lines = sc.textFile('/hello.csv')
lines = lines.filter(lambda row:row != header)
header = lines.first()
print(header)
I've typed in "export PYTHONIOENCODING=utf8" before spark submit, but it didn't work. Can anyone help me? thank you very much!