0

I have a question about Spreading x and y values for plotting from a single column. I try to create x and y values from y2 and I use tidyr spread function to do this.

  test = data.frame(gr =rep(c("Gr1","Gr2"),each=3),
                    y1=rep(c("V1","V2"),each=3),
                    y2=c(12,122,132,14,144,244)  
                    )

 > test
   gr y1  y2
1 Gr1 V1  12
2 Gr1 V1 122
3 Gr1 V1 132
4 Gr2 V2  14
5 Gr2 V2 144
6 Gr2 V2 244

want to create x and y axis values from y2

 library(dplyr)
 library(tidyr)
  test2 <- test%>%

    mutate(No=1:n())%>%
    spread(y1,y2) #sorry there is no group by here

If I don't add mutate(No=1:n()) line it gives Error: Duplicate identifiers for rows (1, 2, 3), (4, 5, 6)

anyway the output is

# A tibble: 6 x 4
# Groups:   gr [2]
      gr    No    V1    V2
* <fctr> <int> <dbl> <dbl>
1    Gr1     1    12    NA
2    Gr1     2   122    NA
3    Gr1     3   132    NA
4    Gr2     4    NA    14
5    Gr2     5    NA   144
6    Gr2     6    NA   244


library(ggplot2)  
  ggplot(data = test2 , aes(y = V2, x = V1)) +
  geom_point(size=2,alpha=0.5,shape=21,aes(fill=gr))+
  theme_bw()

Which creates an empty plot since there is no corresponding V1 value for V2.

If I use na.omit() it delete entire rows. I have a this trouble always whenever I need to use spread function. Some times I create two different data sets than combine them. but I am looking for more elegant solution to this.

The expected output

Thanks.

enter image description here

**Edit after @joran comment

  test = data.frame(gr =rep(c("Gr1","Gr1"),each=3),
                        y1=rep(c("V1","V2"),each=3),
                        y2=c(12,122,132,14,144,244)  
                        )

 library(dplyr)
 library(tidyr)
  test2 <- test%>%

    mutate(No=seq(1,6))%>%
    spread(y1,y2)

> test2
   gr No  V1  V2
1 Gr1  1  12  NA
2 Gr1  2 122  NA
3 Gr1  3 132  NA
4 Gr1  4  NA  14
5 Gr1  5  NA 144
6 Gr1  6  NA 244

The expected output

> test2
       gr No  V1  V2
    1 Gr1  1  12  14
    2 Gr1  2 122  144
    3 Gr1  3 132  244
Alexander
  • 4,527
  • 5
  • 51
  • 98
  • (1) What in your data explicitly ties the x value 12 to the y value 14? Nothing. You would need a repeating sequence by group. (2) You'll have to ditch the gr variable. Think about it, if 12 and 14 are x/y pair that go in the same row, does that row get Gr1 or Gr2? You can't have both in the same variable. – joran Feb 03 '18 at 00:58
  • @joran %100 agree with you. However even you make the Gr1 repeating sequence I am still getting NA values. Something is missing! – Alexander Feb 03 '18 at 01:28
  • You're right! What's missing is my point (1). – joran Feb 03 '18 at 01:45
  • @joran even you ditch the `gr` I still have NA values. The question is how to bring those values to the same row. this post look similar to my question[using-spread-with-duplicate-identifiers-for-rows](https://stackoverflow.com/questions/39053451/using-spread-with-duplicate-identifiers-for-rows) .@aliawadh980 solution is similar to what want to get. – Alexander Feb 03 '18 at 01:53
  • @Nate doesn't matter for now. you can set it to `gr` as default. I'll take care of it later. – Alexander Feb 03 '18 at 01:56
  • 1
    You misunderstood my point (1). You need a column that does ` 1 2 3 1 2 3`. That's what I mean by "repeating column". – joran Feb 03 '18 at 01:57
  • 2
    `test %>% mutate(id = rep(1:3, times = 2)) %>% spread(y1, y2)` – Nate Feb 03 '18 at 01:59
  • @joran Ok now I understand your point. Let me try this approach to in my real data. because there are many grouping factor in it! – Alexander Feb 03 '18 at 02:01

1 Answers1

0
test = data.frame(gr =rep(c("Gr1","Gr1"),each=3),
                    y1=rep(c("V1","V2"),each=3),
                    y2=c(12,122,132,14,144,244)  
                    )

# there must be different value of `gr` for each value of `y1`, or you will get `na` in the result  
test$gr <- rep(c("gr1", "gr2", "gr3"), 2)

# then spread `test`
spread(test, y1, y2)
##    gr  V1  V2
## 1 gr1  12  14
## 2 gr2 122 144
## 3 gr3 132 244
yang
  • 719
  • 3
  • 11