I have a program where I am writing a Pipe Delimited file using PySpark. I want to write the file using Ç - cedilla as the delimiter.
sample code
separator = '|'
concat_udf1 = F.udf(lambda cols: "".join([x+separator if x is not None else "separator" for x in cols]), StringType())
Current dataframe output
7|2020-03-31|xyz
7|2020-03-31|abc
New dataframe output
7Ç2020-03-31Çxyz
7Ç2020-03-31Çabc
If I am changing the separator to Ç - cedilla I get below error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Any help would be appreciated - TIA