I have a "messy" dataset that I'm struggling to convert to a tidy format. Here's what the data looks like:
make <- c("Honda", "Jeep", "Nissan", "Ford")
model <- c("Civic", "Wrangler", "Altima", "Focus")
year <- c(1996, 2000, 2005, 1988)
color <- c("red;green;blue", "red;blue", "purple;red;green;black", "yellow;white;blue")
car.df <- data.frame(make, model, year, color)
What I would like to do is to convert the data to tidy/long format by separating out the "color" field and then making each make/model/year/color a separate line. So the output would look something like this (shown just for the Honda and Jeep):
make.new <- c("Honda", "Honda", "Honda", "Jeep", "Jeep")
model.new <- c("Civic", "Civic", "Civic", "Wrangler", "Wrangler")
year.new <- c(1996, 1996, 1996, 2000, 2000)
color.new <- c("red", "green", "blue", "red", "blue")
car.df.new <- data.frame(make.new, model.new, year.new, color.new)
Any suggestions on how to do this? In the dataset, there can be many different colors so once you separate the color field into different columns, you would potentially have lots of different columns to tidy up (and therefore a different number of rows in the tidy data set at the end for each make/model/year).
Thanks for any helpful suggestions!
Steve