2
genres=c("Action","Adventure","Animation","Biography","Comedy","Crime",
 "Documentary","Drama","Family","Game.Show","Horror","Music","Musical",
 "Mystery","Romance","Sci.Fi","Short","Thriller","War","Western")

This is my vector of genres.

Another data set has the same column names.

This is the data set column names

"Title"        "Genre"        "imdbRating"   "Release_Year" 
"Action"       "Adventure"    "Animation"    "Biography"    "Comedy"    
"Crime"        "Documentary"  "Drama"        "Family" 
"Fantasy"      "Game.Show"    "Horror"       "Music"
"Musical"      "Mystery"      "N.A"          "Romance"
"Sci.Fi"       "Short"        "Sport"        "Thriller"   
"War"          "Western"  

I want to run this command for all genres replacing each genre with the value.

     data_predict$genres[grepl("*genres*", data_predict$Genre)]=1

Orignal Data set


        data_predict<-structure(list(Genre = structure(c(3L, 1L, 2L), .Label = c("Action, Adventure, Sci-Fi", 
"Action, Drama, War", "Sci-Fi"), class = "factor"), Action = c(0, 
0, 0), Adventure = c(0, 0, 0), Animation = c(0, 0, 0), Biography = c(0, 
0, 0), Comedy = c(0, 0, 0), Crime = c(0, 0, 0), Documentary = c(0, 
0, 0), Drama = c(0, 0, 0), Family = c(0, 0, 0), Game.Show = c(0, 
0, 0), Horror = c(0, 0, 0), Music = c(0, 0, 0), Musical = c(0, 
0, 0), Mystery = c(0, 0, 0), Romance = c(0, 0, 0), Sci.Fi = c(0, 
0, 0), Short = c(0, 0, 0), Thriller = c(0, 0, 0), War = c(0, 
0, 0), Western = c(0, 0, 0)), .Names = c("Genre", "Action", "Adventure", 
"Animation", "Biography", "Comedy", "Crime", "Documentary", "Drama", 
"Family", "Game.Show", "Horror", "Music", "Musical", "Mystery", 
"Romance", "Sci.Fi", "Short", "Thriller", "War", "Western"), row.names = c(NA, 
3L), class = "data.frame") 

Expected result

data_predicted<-structure(list(Genre = structure(c(3L, 1L, 2L), .Label = c("Action, Adventure, Sci-Fi", 
    "Action, Drama, War", "Sci-Fi"), class = "factor"), Action = c(0, 
    1, 1), Adventure = c(0, 1, 0), Animation = c(0, 0, 0), Biography = c(0, 
    0, 0), Comedy = c(0, 0, 0), Crime = c(0, 0, 0), Documentary = c(0, 
    0, 0), Drama = c(0, 0, 1), Family = c(0, 0, 0), Game.Show = c(0, 
    0, 0), Horror = c(0, 0, 0), Music = c(0, 0, 0), Musical = c(0, 
    0, 0), Mystery = c(0, 0, 0), Romance = c(0, 0, 0), Sci.Fi = c(0, 
    0, 0), Short = c(0, 0, 0), Thriller = c(0, 0, 0), War = c(0, 
    0, 1), Western = c(0, 0, 0)), .Names = c("Genre", "Action", "Adventure", 
    "Animation", "Biography", "Comedy", "Crime", "Documentary", "Drama", 
    "Family", "Game.Show", "Horror", "Music", "Musical", "Mystery", 
    "Romance", "Sci.Fi", "Short", "Thriller", "War", "Western"), row.names = c(NA, 
    3L), class = "data.frame")
  • the genres are "Sci-fi" " Action, Adventure, Sci-Fi" "Action, Drama, War" I want to update those colums which contains the genreeg.like a row contains action drama and war so its Action Drama and War should become 1 – Priyank Puri Jun 25 '15 at 18:39
  • i am using this code `data_predict$Action[grepl("Action", data_predict$Genre)]=1 data_predict$Adventure[grepl("*Adventure*", data_predict$Genre)]=1 data_predict$Animation[grepl("*Animation*", data_predict$Genre)]=1 data_predict$Biography[grepl("*Biography*", data_predict$Genre)]=1 data_predict$Comedy[grepl("*Comedy*", data_predict$Genre)]=1 data_predict$Crime[grepl("*Crime*", data_predict$Genre)]=1 data_predict$Documentary[grepl("*Documentary*", data_predict$Genre)]=1 data_predict$Drama[grepl("*Drama*", data_predict$Genre)]=1` I want to reduce the steps.Can u help me with this – Priyank Puri Jun 25 '15 at 18:53
  • A u can see in the dataset i want to update only those columns which contain the genre.Eg. the second row contains Action,Adventure and Sci-fi so Action,Adventure,Sci-Fi should be set 1 for this row.[link](http://s15.postimg.org/wumvwxtor/ask.png) – Priyank Puri Jun 25 '15 at 19:03
  • paste(genres, collapse="|") didnt work – Priyank Puri Jun 25 '15 at 19:07
  • A image is not helpful as I had to manually type in the data to test. Please check the guidelines [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – akrun Jun 25 '15 at 19:07
  • It is saying that i have exceeded the characters.I used dput but it gave me a very long structure for just first 3 rows. – Priyank Puri Jun 25 '15 at 19:21
  • Looks like you have to loop, here, i.e `lapply(names(data_predict), function(x) {x1 <- data_predict[,x]; x1[grepl(paste(".*?,x.*"), data_predict$Genre] <- 1; x1})` (not tested) – akrun Jun 25 '15 at 19:22
  • Then try with `dput(droplevels(yourdata[1:5,1:4]))` – akrun Jun 25 '15 at 19:36

1 Answers1

2

Try

library(qdapTools)
mtabulate(strsplit(as.character(data_predict$Genre), ', '))

Or

 data_predict[-1] <- lapply(names(data_predict)[-1],
      function(x) as.numeric(grepl(x, data_predict$Genre)))
akrun
  • 874,273
  • 37
  • 540
  • 662