1

I have a data frame that I would like to stack in R so that I end up with three columns. Below cis some example data in its current format.

> dput(df)
structure(list(Day = c("d1", "d2", "d3", "d4", "d5", "d6", "d7", 
"d8", "d9", "d10"), A1 = c(14L, 24L, 22L, NA, NA, NA, NA, NA, 
NA, NA), A2 = c(9L, 15L, 34L, 2L, 12L, 34L, 234L, 34L, NA, NA
), A3 = c(3L, 4L, 19L, 76L, 34L, 34L, 23L, 24L, 37L, 44L), A1.1 = c(34L, 
55L, 75L, 12L, 56L, 35L, 3L, 22L, NA, NA)), .Names = c("Day", 
"A1", "A2", "A3", "A1.1"), class = "data.frame", row.names = c(NA, 
-10L))

> df
   Day A1  A2 A3 A1.1
1   d1 14   9  3   34
2   d2 24  15  4   55
3   d3 22  34 19   75
4   d4 NA   2 76   12
5   d5 NA  12 34   56
6   d6 NA  34 34   35
7   d7 NA 234 23    3
8   d8 NA  34 24   22
9   d9 NA  NA 37   NA
10 d10 NA  NA 44   NA

I would like to end up with the dataframe formatted like so with three columns entitles "Day", "Animal" and "Count":

   Day  Animal  Count
d1  A1  14
d2  A1  24
d3  A1  22
d4  A1  NA
d5  A1  NA
d6  A1  NA
d7  A1  NA
d8  A1  NA
d9  A1  NA
d10 A1  NA
d1  A2  9
d2  A2  15
d3  A2  34
d4  A2  2
d5  A2  12
d6  A2  34
d7  A2  234
d8  A2  34
d9  A2  NA
d10 A2  NA
d1  A3  3
d2  A3  4
d3  A3  19
d4  A3  76
d5  A3  34
d6  A3  34
d7  A3  23
d8  A3  24
d9  A3  37
d10 A3  44
d1  A1  34
d2  A1  55
d3  A1  75
d4  A1  12
d5  A1  56
d6  A1  35
d7  A1  3
d8  A1  22
d9  A1  NA
d10 A1  NA

I know that this should be an easy task but I am really struggling to find the solution. Any help with be much appreciated.

jjulip
  • 1,093
  • 4
  • 16
  • 24

3 Answers3

5

With dplyr and tidyr:

library(dplyr)
library(tidyr)
df <- df %>% 
  gather("animal", "count", -Day)


head(df)
#   Day animal count
# 1  d1     A1    14
# 2  d2     A1    24
# 3  d3     A1    22
# 4  d4     A1    NA
# 5  d5     A1    NA
# 6  d6     A1    NA
RLave
  • 8,144
  • 3
  • 21
  • 37
  • 1
    Amazing! Thank you so much! Appreciated. Could you add library(tidyr) to your answer for the gather function please? – jjulip Nov 08 '18 at 11:04
  • I have updated my question as I have a large data frame with multiple blocks of data for the same animal and Days but different values and your solution does not work in this case. Any suggestions? This is the error message that I get: Error: Can't bind data because some arguments have the same name Call `rlang::last_error()` to see a backtrace – jjulip Nov 08 '18 at 11:17
  • It is working for me even with the new data, try cleaning your workspace first then retry – RLave Nov 08 '18 at 11:20
  • No. I've tried multiple times with multiple examples and I get the same error every time :( – jjulip Nov 08 '18 at 11:41
  • which version of `dplyr` and `tidyr` do you have? See `packageVersion("dplyr")`. Try update to the last version by reinstalling both: `devtools::install_github("hadley/dplyr")` or `install_packages("dplyr")`. – RLave Nov 08 '18 at 13:00
1

You want to format your data frame from a wide format to a long format using the melt function of the reshape 2 package also answered here

library(reshape2)

df <- structure(list(Day = c("d1", "d2", "d3", "d4", "d5", "d6", "d7", 
                       "d8", "d9", "d10"), A1 = c(14L, 24L, 22L, NA, NA, NA, NA, NA, 
                                                  NA, NA), A2 = c(9L, 15L, 34L, 2L, 12L, 34L, 234L, 34L, NA, NA
                                                  ), A3 = c(3L, 4L, 19L, 76L, 34L, 34L, 23L, 24L, 37L, 44L)), .Names = c("Day", 
                                                                                                                         "A1", "A2", "A3"), class = "data.frame", row.names = c(NA, -10L
                                                                                                                         ))
long_format <- melt(df)
colnames(long_format)[2:3] <- c("Animal","Count")
seth-1
  • 11
  • 2
  • How does melt determine what the id variable is? Or how would I specify this? In my huge dataset it is having trouble doing this. – jjulip Nov 08 '18 at 11:29
  • @jjulip you can try `?melt.data.frame` to get more information on the function. See e.g. under **Arguments** and _id.vars_. – seth-1 Nov 08 '18 at 11:50
0

It can be done with base R function melt:

melt(df)

OUTPUT :

Day variable value
1   d1       A1    14
2   d2       A1    24
3   d3       A1    22
4   d4       A1    NA
5   d5       A1    NA
6   d6       A1    NA
7   d7       A1    NA
8   d8       A1    NA
9   d9       A1    NA
10 d10       A1    NA
11  d1       A2     9
12  d2       A2    15
13  d3       A2    34
14  d4       A2     2
15  d5       A2    12
16  d6       A2    34
17  d7       A2   234
18  d8       A2    34
19  d9       A2    NA
20 d10       A2    NA
21  d1       A3     3
22  d2       A3     4
23  d3       A3    19
24  d4       A3    76
25  d5       A3    34
26  d6       A3    34
27  d7       A3    23
28  d8       A3    24
29  d9       A3    37
30 d10       A3    44
sai saran
  • 737
  • 9
  • 32