I got the encoding error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 3: ordinal not in range(128)
at the following python (pyspark) code, where row is the data frame row:
def rowToLine(row):
line = str(row[0]).strip()
columnNum = 44
for k in xrange(1, columnNum):
line = line + "\t"
line = line + str(row[k]).strip() # encoding error here
return line
I also tried the join below:
def rowToLine(row):
s = "\t"
return s.join(row)
but some values of the row is int, so I got errors:
TypeError: sequence item 19: expected string or Unicode, int found
Does anyone know how to fix this? Thanks!