Extract parameters from a string and create other factor or logical variables

Question

I'm sorry if I couldn't make the question clear enough. I'm having difficulty forming the search keyword/phrase, so here I am.

So, I currently have a data frame something like this:

song.title <- c("A", "B", "C", "D")
genre <- c("rock", "pop, rock", "jazz, pop, funk", "funk, rock")
df <- data.frame(song.title, genre)
view(df)

Then, I need to convert the data frame to something like this:

song.title <- c("A", "B", "C", "D")
is.pop <- c("no", "yes", "yes", "no")
is.rock <- c("yes", "yes", "no", "yes")
is.jazz <- c("no", "no", "yes", "no")
is.funk <- c("no", "no", "yes", "yes")
df_needed <- data.frame(song.title, is.pop, is.rock, is.jazz, is.funk)
view(df_needed)

The actual data I'm working on has 10.000+ rows, so it's hard to determine how many factors (genres/tags) that should be "converted" to columns. What are my options to transform such kind of data in R? Thanks.

AndrewGB · Accepted Answer · 2022-01-05T05:27:52.320

Here is a tidyverse option. Here, I separate the lists in the genre column so that each element is in its own row. Then, I simply paste "is." to each of the genre words, as this will be used for column names when we pivot the table. Then, I create a new column with "yes" to note which genres are present for each song title. Then, we can pivot the table wide and replace NA values with "no". This solution will work well no matter how many genres/tags you have. The only caveat is that I'm assuming all genres/tags will be separated by a comma.

library(tidyverse)

df %>%
  separate_rows(genre, sep = ",") %>%
  mutate(genre = trimws(genre),
         genre = paste0("is.", genre),
         yn = "yes") %>%
  pivot_wider(names_from = "genre", values_from = "yn") %>%
  mutate(across(-song.title, replace_na, "no")) %>% 
  select(1, 3, 2, 4:5)

Output

# A tibble: 4 × 5
  song.title is.pop is.rock is.jazz is.funk
  <chr>      <chr>  <chr>   <chr>   <chr>  
1 A          no     yes     no      no     
2 B          yes    yes     no      no     
3 C          yes    no      yes     yes    
4 D          no     yes     no      yes

Data

df <-
  structure(list(
    song.title = c("A", "B", "C", "D"),
    genre = c("rock", "pop, rock", "jazz, pop, funk", "funk, rock")
  ),
  class = "data.frame",
  row.names = c(NA,-4L))

score 0 · Answer 2 · answered Jan 05 '22 at 04:45

We can use grepl here for a base R option:

is.pop <- ifelse(grepl("\\bpop\\b", df$genre), "yes", "no")
is.rock <- ifelse(grepl("\\brock\\b", df$genre), "yes", "no")
is.jazz <- ifelse(grepl("\\bjazz\\b", df$genre), "yes", "no")
is.funk <- ifelse(grepl("\\bfunk\\b", df$genre), "yes", "no")
df_needed <- data.frame(song.title, is.pop, is.rock, is.jazz, is.funk)
df_needed

  song.title is.pop is.rock is.jazz is.funk
1          A     no     yes      no      no
2          B    yes     yes      no      no
3          C    yes      no     yes     yes
4          D     no     yes      no     yes

Data:

song.title <- c("A", "B", "C", "D")
genre <- c("rock", "pop, rock", "jazz, pop, funk", "funk, rock")
df <- data.frame(song.title, genre)

This seems pretty cumbersome, especially since they said they don't know how many tags to expect — camille, Jan 05 '22 at 05:51

Extract parameters from a string and create other factor or logical variables

2 Answers2