I am new to SPARK and Scala. So wondering how to do this. In Python Pandas I would just .apply() to the grouped column and this would work. Don't know how to do it in SPRAK using Scala.
I have data frame of user names and sites they have visited. I want to combine the site column I get with an array of sites (into a giant string) in it after groupBy "user_name".
val df = Seq(("user1", "facebook.com"), ("user1", "msn.com"), ("user1", "linkedin.com"),("user2","google.com"),("user2","apple.com")).toDF("user_name", "sites")
df.show
df.show
+---------+------------+
|user_name| sites|
+---------+------------+
| user1|facebook.com|
| user1| msn.com|
| user1|linkedin.com|
| user2| google.com|
| user2| apple.com|
+---------+------------+
val grp = df.groupBy("user_name")
Now I want to apply this to the grouped "sites" column
var jn = (url: Array[String]) => url.sortWith(_ < _).mkString(":")
What I want:
+---------+---------------------------------+
|user_name| sites |
+---------+---------------------------------+
| user1|facebook.com:linkedin.com:msn.com|
| user2|apple.com:google.com |
+---------+---------------------------------+
How do I convert the groupedData to a DataFrame in SPARK ?
How do print the groped dataframe as is right after groupby here ?
I have used a udf to change a column in a SPARK dataframe but don't know how to use that on a groupedData. Is their a way to do that?