Spark replace characters AąCć

Question

I used the translate method in spark but I would like to have the same with using the spark regex (replace etc.). Could you Please help to re-write it?

df.withColumn(„name_surname”,translate(col(„name_surname”),”ĄąĆcĘeŁłŹźŻŚśÓóŃń”,”AaCcEeLlZzZSsOoNn”))

score 0 · Accepted Answer · answered Aug 29 '22 at 08:09

I think there's no Spark function to do that, but you can always one of plain Java methods, eg. as suggested in answers to this question, then wrap it in a UDF.

val stripAccents = udf(org.apache.commons.lang3.StringUtils.stripAccents(_))
df.withColumn("name_surname", stripAccents($"name_surname")).show

+-----------------+
|     name_surname|
+-----------------+
|AaCcEeLlZzZSsOoNn|
+-----------------+

Spark replace characters AąCć

1 Answers1