How to map one column of dataframes to another column in Apache Spark Scala?

Question

I have created a dataframe from a CSV file. It has 10 columns two of which are actress and movie title. I want to make actress as a key and title as a value and further want to reduce it by key to get the list of movies for every actress. For that case I have to map actress column to movie title column first. So how to get the tuples of actress, movie tile key value pair in Spark scala. Further, I want to do it using basic operations not SparkSQL.

Could you read [How to make good reproducible Apache Spark Dataframe examples](https://stackoverflow.com/q/48427185/9613318) and [edit] the question following the guidelines? — Alper t. Turker, May 11 '18 at 17:50
@user9613318 I have seen that page and its in python and not relevant to what I am asking. — Yaseen Saleem, May 11 '18 at 18:02

score -1 · Answer 1 · answered May 11 '18 at 19:25

-1

Suggestion : Low question quality, you should look for examples online first and then

val df = ???

val moviesByActressDF = df.groupBy("actress_col")
.agg(collect_list("movie_col"))

Hope this helps, Cheers

answered May 11 '18 at 19:25

Chitral Verma

2,695
1
17
29

Thanks. This is definitely helpful. Thanks for the suggestion too. – Yaseen Saleem May 11 '18 at 20:14
can you mark this as accepted and upvote it if it helped? itll help others looking for a solution – Chitral Verma May 13 '18 at 15:17

How to map one column of dataframes to another column in Apache Spark Scala?

1 Answers1