0

I need some help writing a code in R to mutate a dataframe. Below you will find what my dataframe looks like. The last column which is 'genres' contains an array and needs to be split into multiple columns within the dataframe.

     id value                            name movies_95_04 main_genre
1     0     8                  Tordy, G\xe9za           10     Comedy
2     1    40              Reviczky, G\xe1bor           17     Comedy
3    10    42                   Cserna, Antal           19      Drama
4   100   229                  Marcella, Jade          178      Adult
5  1000     9               Delarive, Mathieu           13      Drama
6 10000    12 De Grandpr\xe9, Fr\xe9d\xe9rick           11      Drama
                                                                          genres
1               Adventure:1,Animation:1,Comedy:2,Family:1,NULL:3,Romance:1,War:1
2            Comedy:7,Crime:2,Fantasy:1,Musical:1,NULL:2,Romance:2,Short:1,War:1
3 Adventure:1,Animation:1,Comedy:2,Drama:5,Fantasy:1,NULL:5,Romance:3,Thriller:1
4                                              Adult:135,Crime:1,NULL:41,Short:1
5                                       Comedy:1,Drama:2,Horror:1,NULL:7,Short:2
6                                   Comedy:1,Drama:5,NULL:3,Romance:1,Thriller:1

This is what I've done so far but instead its giving me a new dataframe which contains empty cells in it.

Merged_actors2_ <- str_split_fixed(Merged_actors$genres, ",", 19)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
ZohakM
  • 1

1 Answers1

1

Split the data on comma (,) into separate rows, separate them into two columns splitting on colon (:) and get the data in wide format.

library(tidyr)

Merged_actors %>%
  separate_rows(genres, sep = ',') %>%
  separate(genres, c('category', 'value'), sep = ':', convert = TRUE) %>%
  pivot_wider(names_from = category, values_from = value, values_fill = 0)

#     id name  movies_95_04 main_genre Adventure Animation Comedy Family
#  <int> <chr>        <int> <chr>          <int>     <int>  <int>  <int>
#1     0 Tord…           10 Comedy             1         1      2      1
#2     1 Revi…           17 Comedy             0         0      7      0
#3    10 Cser…           19 Drama              1         1      2      0
#4   100 Marc…          178 Adult              0         0      0      0
#5  1000 Dela…           13 Drama              0         0      1      0
#6 10000 DeGr…           11 Drama              0         0      1      0
# … with 11 more variables: `NULL` <int>, Romance <int>, War <int>,
#   Crime <int>, Fantasy <int>, Musical <int>, Short <int>, Drama <int>,
#   Thriller <int>, Adult <int>, Horror <int>

data

Merged_actors <- structure(list(id = c(0L, 1L, 10L, 100L, 1000L, 10000L), value = c(8L, 
40L, 42L, 229L, 9L, 12L), name = c("Tordy,G<e9>za", "Reviczky,G<e1>bor", 
"Cserna,Antal", "Marcella,Jade", "Delarive,Mathieu", "DeGrandpr<e9>,Fr<e9>d<e9>rick"
), movies_95_04 = c(10L, 17L, 19L, 178L, 13L, 11L), main_genre = c("Comedy", 
"Comedy", "Drama", "Adult", "Drama", "Drama"), genres = c("Adventure:1,Animation:1,Comedy:2,Family:1,NULL:3,Romance:1,War:1", 
"Comedy:7,Crime:2,Fantasy:1,Musical:1,NULL:2,Romance:2,Short:1,War:1", 
"Adventure:1,Animation:1,Comedy:2,Drama:5,Fantasy:1,NULL:5,Romance:3,Thriller:1", 
"Adult:135,Crime:1,NULL:41,Short:1", "Comedy:1,Drama:2,Horror:1,NULL:7,Short:2", 
"Comedy:1,Drama:5,NULL:3,Romance:1,Thriller:1")), class = "data.frame", row.names = c(NA, -6L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213