1

I have a dataset as follows:

id   paramgroup1
1    CURRENCY=USD~COUNTRY=USA~CUSTCATEGORY=REGULAR
2    CURRENCY=USD~COUNTRY=USA~CUSTCATEGORY=GUEST
3    CURRENCY=INR~COUNTRY=IND~CUSTCATEGORY=REGULAR

Now i want to add a count column here which count the parameter seperated by the delimiter (~). So the final dataset after the transformation operation of Spark,

 id   paramgroup1                                    count 
1    CURRENCY=USD~COUNTRY=USA~CUSTCATEGORY=REGULAR   3
2    CURRENCY=USD~COUNTRY=USA~CUSTCATEGORY=GUEST     3
3    CURRENCY=INR~COUNTRY=IND                        2

Any help would be appreciated....

ben10
  • 83
  • 1
  • 15
  • Related: https://stackoverflow.com/questions/51450004/spark-dataframe-python-count-substring-in-string/51450277#51450277 – pault Aug 15 '18 at 16:39

1 Answers1

0
//in scala,
import org.apache.spark.sql.functions._
val df1 = df.withColumn("count", size(split($"paramgroup1", "~")))
df1.show()
Kishore
  • 5,761
  • 5
  • 28
  • 53