I have written a pyspark code doing following operation but its not working as intended. Can anyone point out my mistake please
# Data cleaning function
def clean_data(data):
rep = data.replace('/','')
rep = data.replace('-','')
rep = data.replace('+','')
rep = data.replace(' ','')
return rep
#clean_data_udf_int = udf(lambda z: clean_data(z), StringType())
#con.show(4)
clean_data_udf = udf(clean_data, StringType())
con = con.withColumn('ph1_f',clean_data_udf('phone1'))
Input dataframe is con:
id phone phone1
1 098 /90
2 + 91 -90
Output i want dataframe is :
id phone phone1
1 98 90
2 91 90