0

I'm having more problems with the tidyr package in R. I am doing an experiment involving splitting up the data frame into plot, plant, and leaf variables, and since I have a large data frame, I need to do this with a code. I'm using RStudio and using the tidyr package.

I need to organize a data frame from this:

library(readr)
library(tidyr)
library(dplyr)

plot <- c("101","101","101","101","101","102","102","102","102","102")
plant <- c("1","2","3","4","5","1","2","3","4","5")
leaf_1 <- c("100","100","100","100","100","100","100","100","100","100")
leaf_2 <- c("90","90","90","90","90","90","90","90","90","90")
leaf_3 <- c("80","80","80","80","80","80","80","80","80","80")

plot <- as.data.frame(plot)
plant <- as.data.frame(plant)
leaf_1 <- as.data.frame(leaf_1)
leaf_2 <- as.data.frame(leaf_2)
leaf_3 <- as.data.frame(leaf_3)

data <- cbind(plot, plant, leaf_1, leaf_2, leaf_3)
View(data)

Into this:

plot <- c("101","101","101", "101","101","101","101","101","101","101","101","101","101","101","101")
plant <- c("1","1","1","2","2","2","3","3","3","4","4","4","5","5","5")
leaf_number <- c("1","2","3","1","2","3","1","2","3","1","2","3","1","2","3")
score <- c("100","90","80","100","90","80","100","90","80","100","90","80","100","90","80")

plot <- as.data.frame(plot)
plant <- as.data.frame(plant)
leaf_number <- as.data.frame(leaf_number)
score <- as.data.frame(score)

example <- cbind(plot, plant, leaf_number, score)
View(example)

Here is what I have already tried:

data1 <- gather(data, leaf_number, score, -plot)

But it just doesn't gather the data frame into what I need. Any help is greatly appreciated, thanks so much everybody!

camille
  • 16,432
  • 18
  • 38
  • 60
ihb
  • 292
  • 9
  • 27
  • 4
    Not related to your problem, but that is a really convoluted way to generate a data frame. Make your life easier: `data <- data.frame(plot = c("101","101", ...), plant = c("1","2", ...))` and so on. Also all your values are numeric, so why put them in quotes? There is no need. – neilfws Sep 16 '19 at 23:02
  • 1
    Or another way is `tidyr::gather(data, leaf_number, score, starts_with("leaf"))` – Ronak Shah Sep 16 '19 at 23:58
  • 1
    If you update your version to tidyr 1.0.0, here is an option using the new `pivot_longer`: `data %>% pivot_longer(cols=starts_with("leaf_"), names_to = "leaf_number", names_prefix = "leaf_", values_to = "score")` – Dave2e Sep 17 '19 at 00:02
  • Thank you for those comments, they help me out a lot!! – ihb Sep 17 '19 at 16:50

1 Answers1

2
data <- data.frame(
  plot = c(101,101,101,101,101,102,102,102,102,102),
  plant = c(1,2,3,4,5,1,2,3,4,5),
  leaf_1 = c(100,100,100,100,100,100,100,100,100,100),
  leaf_2 = c(90,90,90,90,90,90,90,90,90,90),
  leaf_3 = c(80,80,80,80,80,80,80,80,80,80)
)

gather(data, leaf_number, score, -c(plot, plant))
#   plot plant leaf_number score
#1   101     1      leaf_1   100
#2   101     2      leaf_1   100
#3   101     3      leaf_1   100
#4   101     4      leaf_1   100
#5   101     5      leaf_1   100
#6   102     1      leaf_1   100
#7   102     2      leaf_1   100
#etc.
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • 1
    ... and you could add `%>% arrange(plot, plant)` to get the same ordering as your example, if that helps. – Jon Spring Sep 16 '19 at 23:08
  • 1
    ...and you could use `%>% mutate(leaf_number = parse_number(leaf_number))` if you want the number without the "leaf_" prefix. – Jon Spring Sep 16 '19 at 23:09
  • Awesome, thanks so much, I learned a lot from this thread and I really appreciate it!! – ihb Sep 17 '19 at 16:51