1

I have a dataset1 which is as follows:

dataset1 <- data.frame(  
   id1 = c(1, 1, 1, 2, 2, 2),    
   id2 = c(122, 122, 122, 133, 133, 133),  
   num1 = c(1, NA, NA, 50,NA, NA),  
   num2 = c(NA, 2, NA, NA, 45, NA),  
   num3 = c(NA, NA, 3, NA, NA, 4)  
 )

How to convert multiple rows into a single row?

The desired output is:

id1, id2, num1, num2, num3   
1    122   1     2      3      
2    133   50    45     4  
Jaap
  • 81,064
  • 34
  • 182
  • 193

3 Answers3

1
library(dplyr)

dataset1 %>% group_by(id1, id2) %>%
  summarise_all(funs(.[!is.na(.)])) %>%
  as.data.frame()

#   id1 id2 num1 num2 num3
# 1   1 122    1    2    3
# 2   2 133   50   45    4

Note: Assuming there will be only 1 non-NA item in a column.

MKR
  • 19,739
  • 4
  • 23
  • 33
0

Using data.table

library(data.table)
data.table(dataset1)[, lapply(.SD, sum, na.rm = TRUE), by = c("id1", "id2")]

#   id1 id2 num1 num2 num3
#1:   1 122    1    2    3
#2:   2 133   50   45    4
nghauran
  • 6,648
  • 2
  • 20
  • 29
0

You can use dplyr to achieve that:

library(dplyr)
dataset1 %>% 
  group_by(id1, id2) %>% 
  mutate(
    num1 = sum(num1, na.rm=T),
    num2 = sum(num2, na.rm=T),
    num3 = sum(num3, na.rm=T)
  ) %>% 
  distinct()

Output:

enter image description here

This is also assuming if there's a repeated value in any of the variable we're going to sum it (if id1 = 1 has two values for num1, we're going to sum the value). If you're confident that every id has only one possible value for each of the num (num1 to num3), then don't worry about it.

onlyphantom
  • 8,606
  • 4
  • 44
  • 58