0

The dataframe is filled with movies and their genres as well as the actor name. I want to combine all of the duplicate movies with different actor listings into one movie with all of the different actors.

NumID col1 col2 col3 col4 col5
tt0035790 Action History War 2017 Walter Huston
tt0035790 Action History War 2017 Harry Davenport
tt0035790 Action History War 2017 Dana Andrews
tt0066853 Drama 2016 NA NA Ivan de Albuquerque
tt0066853 Drama 2016 NA NA Rubens Correia

This is the result that I want:

NumID col1 col2 col3 col4 col5 col6 col7
tt0035790 Action History War 2017 Walter Huston Harry Davenport Dana Andrews
tt0066853 Drama 2016 NA NA Ivan de Albuquerque Rubens Correia NA

I want to combine it based on the NumID.

If there is no way to do so in R and Rstudio. I am comfortable with writing to CSV and doing the operations in python and pandas but I would greatly prefer a Rstudio solution.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

1 Answers1

1

Following this answer:

df %>% 
  group_by(NumID) %>%
  mutate(row = row_number()) %>%
  pivot_wider(
    names_from = "row",
    values_from = "col5",
    names_prefix = "actor"
    )

Data

df <- structure(list(NumID = c("tt0035790", "tt0035790", "tt0035790", 
"tt0066853", "tt0066853"), col1 = c("Action", "Action", "Action", 
"Drama", "Drama"), col2 = c("History", "History", "History", 
"2016", "2016"), col3 = c("War", "War", "War", NA, NA), col4 = c(2017L, 
2017L, 2017L, NA, NA), col5 = c("Walter Huston", "Harry Davenport", 
"Dana Andrews", "Ivan de Albuquerque", "Rubens Correia")), class = "data.frame", row.names = c(NA, 
-5L))
tonybot
  • 643
  • 2
  • 10
  • Haha, I mean it's not too much effort if I use `df <- read.delim("clipboard", header = T)` and use `dput(df)`. – tonybot Apr 27 '21 at 23:35
  • I thought it was an image. I realized it is a markdown hence one is able to copy sorry bout that – Onyambu Apr 27 '21 at 23:38