0

If two data frames are

  symbol wgt
1     A   2
2     C   4
3     D   6

  symbol wgt
1     A  20
2     D  10

how can I add them so that missing observations for a "symbol" in either data frame are treated as zero, giving

  symbol wgt
1     A   22
2     C   4
3     D   16
Fortranner
  • 2,525
  • 2
  • 22
  • 25
  • Perhaps you could exhance your example data by adding a row with E=16 to the second `data.frame` to support the case *so that missing observations for a "symbol" in either data frame are treated as zero* – R Yoda Jul 25 '20 at 15:43
  • What is the expected handling of `NA` values? – R Yoda Jul 25 '20 at 15:55

3 Answers3

3

You can join the two dataframes by symbol , replace NA with 0 and add the two weights.

library(dplyr)

df1 %>%
  left_join(df2, by = 'symbol') %>%
   mutate(wgt.y  = replace(wgt.y, is.na(wgt.y), 0), 
          wgt  = wgt.x + wgt.y) %>%
  select(-wgt.x, -wgt.y)

#  symbol wgt
#1      A  22
#2      C   4
#3      D  16

data

df1 <- structure(list(symbol = c("A", "C", "D"), wgt = c(2L, 4L, 6L)),
class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(symbol = c("A", "D"), wgt = c(20L, 10L)),
class = "data.frame", row.names = c(NA, -2L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

Try this one line solution by pipes:

#Data
library(dplyr)
df1 <- structure(list(symbol = c("A", "C", "D"), wgt = c(2L, 4L, 6L)), class = "data.frame", row.names = c("1", 
"2", "3"))
df2 <- structure(list(symbol = c("A", "D"), wgt = c(20L, 10L)), class = "data.frame", row.names = c("1", 
"2"))

#Code
df1 %>% left_join(df2,by = 'symbol') %>% mutate(wgt = rowSums(.[-1],na.rm=T)) %>% select(c(1,4))

  symbol wgt
1      A  22
2      C   4
3      D  16
Duck
  • 39,058
  • 13
  • 42
  • 84
1

With data.table and the data provided in the answer of @RonakShah and @Duck the solution could be a simple aggregation:

# Convert data.frame to data.table (very fast since inplace)
setDT(df1)
setDT(df2)

# combine both data.frames into one data.frame, group by symbol, apply the sum (NAs are ignored = counted as zero)
rbind(df1,df2)[, sum(wgt, na.rm = TRUE), by = symbol]

# Output
   symbol V1
1:      A 22
2:      C  4
3:      D 16

Note: If you want to use base R only (without data.table) you could use aggregate instead:

aggregate(wgt ~ symbol, rbind(df1,df2), sum)
R Yoda
  • 8,358
  • 2
  • 50
  • 87