1

I have the following data-set

ID  COL1    COL2    COL3
1   22      12      NA
2   2       NA      NA
3   1       2       4
4   NA      NA      NA

The above data needs to be converted into the following format

ID  VALUE
1   22
1   12
2   2
3   1
3   2
3   4

Please note that NAs are present in the source data frame which should be ignored in the final table.

Frank
  • 66,179
  • 8
  • 96
  • 180
SoakingHummer
  • 562
  • 1
  • 7
  • 25
  • Use `tidyr::gather` – Luiz Rodrigo Jul 27 '17 at 15:34
  • `reshape2::melt(DF, id = "ID", na.rm = TRUE)` gets you almost there. (You'd need to install the reshape2 package.) – Frank Jul 27 '17 at 15:35
  • 2
    @Jaap's answer in the linked Q&A covers the `na.rm =` argument to various functions (melt in reshape2, melt in data.table, gather in tidyr). If you don't want to install a new package, there's `subset(cbind(DF[1], v = unlist(DF[-1])), !is.na(v))` from base or @d.b's answer below. – Frank Jul 27 '17 at 15:50
  • Related: https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format – Frank Jul 27 '17 at 15:53

4 Answers4

4

For speed with the larger datasets, use the data.table melt method:

library("data.table")
setDT(df)
melt(df, id.vars = "ID", na.rm = TRUE)
#    ID variable value
# 1:  1     COL1    22
# 2:  2     COL1     2
# 3:  3     COL1     1
# 4:  1     COL2    12
# 5:  3     COL2     2
# 6:  3     COL3     4
mlegge
  • 6,763
  • 3
  • 40
  • 67
3
library(dplyr)
library(tidyr)

gather(df, column, value, COL1:COL3, na.rm=TRUE) %>%
  select(-column)
jdb
  • 147
  • 7
Alex P
  • 1,574
  • 13
  • 28
2

In base R, you could use lapply to go through columns and extract non NA elements and corresponding ID.

do.call(rbind, lapply(df[,-1], function(x)
    data.frame(ID = df$ID[!is.na(x)], VALUE = x[!is.na(x)])))
#       ID VALUE
#COL1.1  1    22
#COL1.2  2     2
#COL1.3  3     1
#COL2.1  1    12
#COL2.2  3     2
#COL3    3     4

If necessary, the order can be changed in one additional step

df2 = do.call(rbind, lapply(df[,-1], function(x)
    data.frame(ID = df$ID[!is.na(x)], VALUE = x[!is.na(x)])))
do.call(rbind, split(df2, df2$ID))
#         ID VALUE
#1.COL1.1  1    22
#1.COL2.1  1    12
#2         2     2
#3.COL1.3  3     1
#3.COL2.2  3     2
#3.COL3    3     4

DATA

df = structure(list(ID = 1:4, COL1 = c(22L, 2L, 1L, NA), COL2 = c(12L, 
NA, 2L, NA), COL3 = c(NA, NA, 4L, NA)), .Names = c("ID", "COL1", 
"COL2", "COL3"), class = "data.frame", row.names = c(NA, -4L))
d.b
  • 32,245
  • 6
  • 36
  • 77
1

Here is a base R option

d1 <- na.omit(data.frame(ID = rep(df1$ID, each = ncol(df1)-1), VALUE = c(t(df1[-1]))))
d1
#  ID VALUE
#1  1    22
#2  1    12
#4  2     2
#7  3     1
#8  3     2
#9  3     4

Or we can use a compact option with data.table

library(data.table)
setDT(df1)[, unlist(.SD), .(ID)][!is.na(V1)]
akrun
  • 874,273
  • 37
  • 540
  • 662