I have a data frame with one column that contains strings of different length. For each row, I need to split the long string based on ', ' separator into individual string elements. Then, for each possible individual string I need to create a new column that contains a 1 if that string is present in the row and a 0 otherwise.
I've done it using loops below. However, maybe there is a more elegant way of doing it - e.g., using some existing data wrangling package? Thanks a lot! Here is my code:
# Create an example data frame with one column with strings:
df = data.frame(a = c("one, two, three",
"one, three",
"two, three, four, five",
"one, four, five",
"two"), stringsAsFactors = FALSE)
df
str(df$a)
# Split column 'a' into individual strings:
library(stringr)
split_list <- str_split(df$a, ", ")
split_list # the result is a list of strings
# Grab unique values of all strings:
unique_strings <- sort(unique(unlist(split_list)))
unique_strings
# For each string in unique_strings create a variable with zeros:
df[unique_strings] <- 0
df
# Replace a zero with a 1 in a column if that row contains that string:
for(row in 1:nrow(df)){ # loop through rows
for(string in split_list[[row]]){ # split a string; populate relevant columns
df[row, string] <- 1
}
}
df