I'm new here. I have just started learning R.
I have this question:
Suppose I have a dataframe:
name = c("John", "John","John","John","Mark","Mark","Mark","Mark","Dave", "Dave","Dave","Dave")
color = c("red", "blue", "green", "yellow","red", "blue", "green", "yellow","red", "blue", "green", "yellow")
value = c( 1,2,1,3,5,5,3,2,4,6,7,8)
df = data.frame(name, color, value)
#View(df)
df
# name color value
# 1 John red 1
# 2 John blue 2
# 3 John green 1
# 4 John yellow 3
# 5 Mark red 5
# 6 Mark blue 5
# 7 Mark green 3
# 8 Mark yellow 2
# 9 Dave red 4
# 10 Dave blue 6
# 11 Dave green 7
# 12 Dave yellow 8
and I want it to look like this:
# names red blue green yellow
#1 John 1 2 1 3
#2 Mark 5 5 3 2
#3 Dave 4 6 7 8
That is, the entries in the first column (name) will become unique and the levels in the second column (color) will be new columns and the entries that will be in these new columns will come from the corresponding rows in the third column (value) in the original data frame.
I can accomplish this using the following:
library(dplyr)
df = df %>%
group_by(name) %>%
mutate(red = ifelse(color == "red", value, 0.0),
blue = ifelse(color == "blue", value, 0.0),
green = ifelse(color == "green", value, 0.0),
yellow = ifelse(color == "yellow", value, 0.0)) %>%
group_by(name) %>%
summarise_each(funs(sum), red, blue, green, yellow)
df
name red blue green yellow
1 Dave 4 6 7 8
2 John 1 2 1 3
3 Mark 5 5 3 2
But this would not be ideal if there are lots of levels in the color column. How would I go on doing that?
Thank you!