I am trying to clean a data set and create 3 variables under the names: Adventure, Action and Comedy. The raw data set has 3000 observation (imported filename: dat). I am showing only few observations
id Runtime Genres
37 75 animation, adventure, family, fantasy, musical
1 162 action, adventure, fantasy, sci_fi
95 126 action, fantasy
100 101 comedy, drama, fantasy
82 136 action, adventure, sci-fi
99 117 animation, adventure, comedy, family, sport
91 95 animation, comedy, crime, family
After importing the dataset in R separated all Genres into 5 using following R code:
dat1 <- dat %>% separate (Genres, c("Genres1","Genres2" ,"Genres3" ,"Genres4" ,"Genres5" ), sep=",", extra = "drop", fill = "right")
id Runtime Genres1 Genres2 Genres3 Genres4 Genres5
37 75 animation adventure family fantasy musical
1 162 action adventure fantasy sci_fi
95 126 action fantasy
100 101 comedy drama fantasy
82 136 action adventure sci-fi
99 117 animation adventure comedy family sport
91 95 animation comedy crime family
How do collapse all the genres into 1 category each for action, adventure, and comedy?
I tried using the following code:
created a empty column for adventure using
dat1 ["adventure"] <- NA
dat1$adventure <- ifelse(dat1$Genres1=="adventure",1,(ifelse(dat1$Genres2=="adventure",1,0)))
After suggestion shortened the code to
dat1$adventure <- ifelse((dat1$Genres1=="adventure" | dat1$Genres2=="adventure" | dat1$Genres3=="adventure" | dat1$Genres4=="adventure" ),1, 0)
id Runtime Genres1 Genres2 Genres3 Genres4 Genres5 Adventure
37 75 animation adventure family fantasy musical 0
1 162 action adventure fantasy sci_fi 0
95 126 action fantasy 0
100 101 comedy drama fantasy 0
82 136 action adventure sci-fi 0
99 117 animation adventure comedy family sport 0
91 95 animation comedy crime family 0
The code was able to extract adventure for Genres1
but returned zero for Genres2
.
I have reedited the question. I tried things suggested but not sure how to go about it as there are 3000 observation.
After running suggestion
list of genres, formation of vectors and assigning it to dat2
dat2 <- c( "adventure", "comedy", "action", "drama", "animation", "fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror", "musical","history", "war", "documentary", "biography")
table(factor( dat2 )) table(factor( dat2 ))
action adventure animation biography comedy documentary drama
1 1 1 1 1 1 1
family fantasy history horror musical mystery romance
1 1 1 1 1 1 1
sci-fi thriller war
1 1 1
creating the function
fun1 <- function("adventure", "comedy", "action", "drama", "animation",
"fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror",
"musical","history", "war", "documentary", "biography")) {
vector_of_cur_genres <- seperate(i, sep = ", ")
result <- table(factor(vector_of_cur_genres, dat2))
return(result)
}
# Results
fun1 <- function("adventure", "comedy", "action", "drama",
"animation", "fantasy", "mystery", "family", "sci-fi", "thriller",
"romance", "horror", "musical","history", "war", "documentary",
"biography")) {
Error: unexpected string constant in "fun1 <- function("adventure""
> vector_of_cur_genres <- separate(i, sep = ", ")
Error: Please supply column name
> result <- table(factor(vector_of_cur_genres, dat2))
Error in factor(vector_of_cur_genres, dat2) :
object 'vector_of_cur_genres' not found
> return(result)
Error: no function to return from, jumping to top level
> }
Error: unexpected '}' in "}"
mat <- mapply(fun1,dat2$Genres)
Error in match.fun(FUN) : object 'fun1' not found