I have a question about Spreading x and y values for plotting from a single column. I try to create x and y values from y2
and I use tidyr
spread
function to do this.
test = data.frame(gr =rep(c("Gr1","Gr2"),each=3),
y1=rep(c("V1","V2"),each=3),
y2=c(12,122,132,14,144,244)
)
> test
gr y1 y2
1 Gr1 V1 12
2 Gr1 V1 122
3 Gr1 V1 132
4 Gr2 V2 14
5 Gr2 V2 144
6 Gr2 V2 244
want to create x and y axis values from y2
library(dplyr)
library(tidyr)
test2 <- test%>%
mutate(No=1:n())%>%
spread(y1,y2) #sorry there is no group by here
If I don't add mutate(No=1:n())
line it gives
Error: Duplicate identifiers for rows (1, 2, 3), (4, 5, 6)
anyway the output is
# A tibble: 6 x 4
# Groups: gr [2]
gr No V1 V2
* <fctr> <int> <dbl> <dbl>
1 Gr1 1 12 NA
2 Gr1 2 122 NA
3 Gr1 3 132 NA
4 Gr2 4 NA 14
5 Gr2 5 NA 144
6 Gr2 6 NA 244
library(ggplot2)
ggplot(data = test2 , aes(y = V2, x = V1)) +
geom_point(size=2,alpha=0.5,shape=21,aes(fill=gr))+
theme_bw()
Which creates an empty plot since there is no corresponding V1 value for V2.
If I use na.omit()
it delete entire rows.
I have a this trouble always whenever I need to use spread
function. Some times I create two different data sets than combine them. but I am looking for more elegant solution to this.
The expected output
Thanks.
**Edit after @joran comment
test = data.frame(gr =rep(c("Gr1","Gr1"),each=3),
y1=rep(c("V1","V2"),each=3),
y2=c(12,122,132,14,144,244)
)
library(dplyr)
library(tidyr)
test2 <- test%>%
mutate(No=seq(1,6))%>%
spread(y1,y2)
> test2
gr No V1 V2
1 Gr1 1 12 NA
2 Gr1 2 122 NA
3 Gr1 3 132 NA
4 Gr1 4 NA 14
5 Gr1 5 NA 144
6 Gr1 6 NA 244
The expected output
> test2
gr No V1 V2
1 Gr1 1 12 14
2 Gr1 2 122 144
3 Gr1 3 132 244