How to the add count of column elements in a specific column of dataset in Spark

Question

I have a dataset as follows:

id   paramgroup1
1    CURRENCY=USD~COUNTRY=USA~CUSTCATEGORY=REGULAR
2    CURRENCY=USD~COUNTRY=USA~CUSTCATEGORY=GUEST
3    CURRENCY=INR~COUNTRY=IND~CUSTCATEGORY=REGULAR

Now i want to add a count column here which count the parameter seperated by the delimiter (~). So the final dataset after the transformation operation of Spark,

 id   paramgroup1                                    count 
1    CURRENCY=USD~COUNTRY=USA~CUSTCATEGORY=REGULAR   3
2    CURRENCY=USD~COUNTRY=USA~CUSTCATEGORY=GUEST     3
3    CURRENCY=INR~COUNTRY=IND                        2

Any help would be appreciated....

Related: https://stackoverflow.com/questions/51450004/spark-dataframe-python-count-substring-in-string/51450277#51450277 — pault, Aug 15 '18 at 16:39

score 0 · Answer 1 · answered Aug 14 '18 at 05:20

0

//in scala,
import org.apache.spark.sql.functions._
val df1 = df.withColumn("count", size(split($"paramgroup1", "~")))
df1.show()

answered Aug 14 '18 at 05:20

Kishore

5,761
5
28
53

Assumes at least one entry in that field. – thebluephantom Aug 14 '18 at 11:17

How to the add count of column elements in a specific column of dataset in Spark

1 Answers1