The dataframe is filled with movies and their genres as well as the actor name. I want to combine all of the duplicate movies with different actor listings into one movie with all of the different actors.
NumID | col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|---|
tt0035790 | Action | History | War | 2017 | Walter Huston |
tt0035790 | Action | History | War | 2017 | Harry Davenport |
tt0035790 | Action | History | War | 2017 | Dana Andrews |
tt0066853 | Drama | 2016 | NA | NA | Ivan de Albuquerque |
tt0066853 | Drama | 2016 | NA | NA | Rubens Correia |
This is the result that I want:
NumID | col1 | col2 | col3 | col4 | col5 | col6 | col7 |
---|---|---|---|---|---|---|---|
tt0035790 | Action | History | War | 2017 | Walter Huston | Harry Davenport | Dana Andrews |
tt0066853 | Drama | 2016 | NA | NA | Ivan de Albuquerque | Rubens Correia | NA |
I want to combine it based on the NumID.
If there is no way to do so in R and Rstudio. I am comfortable with writing to CSV and doing the operations in python and pandas but I would greatly prefer a Rstudio solution.