I have created a dataframe from a CSV file. It has 10 columns two of which are actress and movie title. I want to make actress as a key and title as a value and further want to reduce it by key to get the list of movies for every actress. For that case I have to map actress column to movie title column first. So how to get the tuples of actress, movie tile key value pair in Spark scala. Further, I want to do it using basic operations not SparkSQL.
Asked
Active
Viewed 457 times
-3
-
Could you read [How to make good reproducible Apache Spark Dataframe examples](https://stackoverflow.com/q/48427185/9613318) and [edit] the question following the guidelines? – Alper t. Turker May 11 '18 at 17:50
-
what, who is stopping you to do this? – Gaurang Shah May 11 '18 at 17:53
-
@user9613318 I have seen that page and its in python and not relevant to what I am asking. – Yaseen Saleem May 11 '18 at 18:02
1 Answers
-1
Suggestion : Low question quality, you should look for examples online first and then
val df = ???
val moviesByActressDF = df.groupBy("actress_col")
.agg(collect_list("movie_col"))
Hope this helps, Cheers

Chitral Verma
- 2,695
- 1
- 17
- 29
-
Thanks. This is definitely helpful. Thanks for the suggestion too. – Yaseen Saleem May 11 '18 at 20:14
-
can you mark this as accepted and upvote it if it helped? itll help others looking for a solution – Chitral Verma May 13 '18 at 15:17