You can use textFile
function of sparkContext
and use string.printable
to remove all special characters from strings.
import string
sc.textFile(inputPath to csv file)\
.map(lambda x: ','.join([''.join(e for e in y if e in string.printable).strip('\"') for y in x.split(',')]))\
.saveAsTextFile(output path )
Explanation
For your input line "@TSX•","None"
for y in x.split(',')
splits the string line to ["@TSX•", "None"]
where y
represent each elements in the array while iterating
for e in y if e in string.printable
is checking each character in y
is printable or not
if printable then the characters are joined to form a string of printable characters
.strip('\"')
removes the preceding and ending inverted commas from the printable string
finally the list of Strings is converted to comma sepated string by ','.join([''.join(e for e in y if e in string.printable).strip('\"') for y in x.split(',')])
I hope the explanation is clear enough to understand