Converting single row to multiple rows, ignoring NAs

Question

I have the following data-set

ID  COL1    COL2    COL3
1   22      12      NA
2   2       NA      NA
3   1       2       4
4   NA      NA      NA

The above data needs to be converted into the following format

Please note that NAs are present in the source data frame which should be ignored in the final table.

`reshape2::melt(DF, id = "ID", na.rm = TRUE)` gets you almost there. (You'd need to install the reshape2 package.) — Frank, Jul 27 '17 at 15:35
@Jaap's answer in the linked Q&A covers the `na.rm =` argument to various functions (melt in reshape2, melt in data.table, gather in tidyr). If you don't want to install a new package, there's `subset(cbind(DF[1], v = unlist(DF[-1])), !is.na(v))` from base or @d.b's answer below. — Frank, Jul 27 '17 at 15:50
Related: https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format — Frank, Jul 27 '17 at 15:53

score 4 · Answer 1 · answered Jul 27 '17 at 15:46

For speed with the larger datasets, use the data.table melt method:

library("data.table")
setDT(df)
melt(df, id.vars = "ID", na.rm = TRUE)
#    ID variable value
# 1:  1     COL1    22
# 2:  2     COL1     2
# 3:  3     COL1     1
# 4:  1     COL2    12
# 5:  3     COL2     2
# 6:  3     COL3     4

score 3 · Accepted Answer · edited Jul 27 '17 at 15:39

3

library(dplyr)
library(tidyr)

gather(df, column, value, COL1:COL3, na.rm=TRUE) %>%
  select(-column)

edited Jul 27 '17 at 15:39

jdb

147
7

answered Jul 27 '17 at 15:36

Alex P

1,574
13
28

d.b · Answer 3 · 2017-07-27T15:43:14.470

In base R, you could use lapply to go through columns and extract non NA elements and corresponding ID.

do.call(rbind, lapply(df[,-1], function(x)
    data.frame(ID = df$ID[!is.na(x)], VALUE = x[!is.na(x)])))
#       ID VALUE
#COL1.1  1    22
#COL1.2  2     2
#COL1.3  3     1
#COL2.1  1    12
#COL2.2  3     2
#COL3    3     4

If necessary, the order can be changed in one additional step

df2 = do.call(rbind, lapply(df[,-1], function(x)
    data.frame(ID = df$ID[!is.na(x)], VALUE = x[!is.na(x)])))
do.call(rbind, split(df2, df2$ID))
#         ID VALUE
#1.COL1.1  1    22
#1.COL2.1  1    12
#2         2     2
#3.COL1.3  3     1
#3.COL2.2  3     2
#3.COL3    3     4

DATA

df = structure(list(ID = 1:4, COL1 = c(22L, 2L, 1L, NA), COL2 = c(12L, 
NA, 2L, NA), COL3 = c(NA, NA, 4L, NA)), .Names = c("ID", "COL1", 
"COL2", "COL3"), class = "data.frame", row.names = c(NA, -4L))

akrun · Answer 4 · 2017-07-27T16:49:03.830

1

Here is a base R option

d1 <- na.omit(data.frame(ID = rep(df1$ID, each = ncol(df1)-1), VALUE = c(t(df1[-1]))))
d1
#  ID VALUE
#1  1    22
#2  1    12
#4  2     2
#7  3     1
#8  3     2
#9  3     4

Or we can use a compact option with data.table

library(data.table)
setDT(df1)[, unlist(.SD), .(ID)][!is.na(V1)]

edited Jul 27 '17 at 16:49

answered Jul 27 '17 at 16:19

akrun

874,273
37
540
662

Converting single row to multiple rows, ignoring NAs

4 Answers4