2

I have a csv file on French, that contain a special character like (é, à, è, ç). I put this csv file in hdfs via spark 2 - scala 2.11. I did the transformation of data, then a transfert my dataframe to Elasticsearch 5.6.

These special character appears like a bizarre character in the Dashboard kibana.

I want replace these spacial character by there normal letter, like:

é = e
è = e
à = a

I did it, using two possible:

val urlCleaner = (joined_df2:String) => {
   if (s == null) null else s.replaceAll("é","e")
}

And

val newsjoined_df2=My_Dataframe.withColumn('nom_equipe', when(col('nom_equipe').equalTo('é'), 'e').otherwise(col('nom_equipe'))

But, it do not work. Someone please can suggest me a solution ?

vero
  • 1,005
  • 6
  • 16
  • 29

1 Answers1

1

you can create a an UDF

import org.apache.spark.sql.functions
import spark.implicits._

val removeChars = functions.udf((s:String) => {
         s.replaceAll("è","e")
          .replaceAll("é","e")
          .replaceAll("à","a")
          .replaceAll("ç","c")
})

And then call withColumn in your DF with that UDF sending it the column name:

df.withColumn("nom_equipe", removeChars($"nom_equipe"))

Here is a quick test:

Input:

+------------+
|  nom_equipe|
+------------+
|       héllo|
|     chénene|
+------------+

Output:

+------------+
|  nom_equipe|
+------------+
|       hello|
|     chenene|
+------------+
SCouto
  • 7,808
  • 5
  • 32
  • 49
  • Thank you for your answer. I resolved it by adding : spark.driver.extraJavaOptions=-Dfile.encoding=iso-8859-1 in the file config.proporties then I added also this command line in the command of loading the file csv from hdfs – vero Mar 20 '18 at 09:44
  • Can you suggest me a solution for this question https://stackoverflow.com/questions/49274012/calculate-the-deviation-time-between-tow-dates?noredirect=1#comment85553831_49274012 – vero Mar 20 '18 at 10:13