R: How can i merge more 2 data frames with adding values?

Question

I got two data frames like this:

dat1
  col   n
1  A    1
2  B    1
3  C    2


dat2
  col   n
1  A    2
2  B    1
3  C    1
4  D    1

and I want to make a data frame like this with dat1 and dat2:

dat3
  col   n
1  A    3
2  B    2
3  C    3
4  D    1

I'm trying to make data frame (dat3) with dplyr bind_rows, group_by and count, but I can't.

bind_rows(dat1, dat2) %>%
  group_by(col)

result:
  col   n 
1  A    1
2  B    1
3  C    2
4  A    2
5  B    1
6  C    1
7  D    1

bind_rows(dat1, dat2) %>%
  group_by(col) %>%
  count(n)

result:
  col   n   nn
1  A    1    1
2  A    2    1
3  B    1    2
4  C    1    1
5  C    2    1
6  D    1    1

How can I make dat3?

score 1 · Answer 1 · answered Dec 08 '19 at 15:02

1

You should summarise instead of counting:

bind_rows(dat1, dat2) %>%
  group_by(col) %>% summarise(Sum = sum(n))

# A tibble: 4 x 2
  col     Sum
  <chr> <dbl>
1 A         3
2 B         2
3 C         3
4 D         1

answered Dec 08 '19 at 15:02

dc37

15,840
4
15
32

I just.. tried to keep using 'n', and it's stupid mistake. thanks for your answer :) – KNOCK Dec 08 '19 at 16:37

r2evans · Answer 2 · 2019-12-08T15:35:30.500

1

Third option, just in case:

psum <- function(..., na.rm = TRUE) {
  m <- cbind(...)
  apply(m, 1, sum, na.rm = na.rm)
}

full_join(dat1, dat2, by = "col") %>%
  mutate(n = psum(n.x, n.y))
#   col n.x n.y n
# 1   A   1   2 3
# 2   B   1   1 2
# 3   C   2   1 3
# 4   D  NA   1 1

(n.x and n.y columns are generated by the join due to same-named columns, they are retained here solely for demonstration. Yes, psum is a hack here, likely something better out there ...)

edited Dec 08 '19 at 15:35

answered Dec 08 '19 at 15:05

r2evans

141,215
6
77
149

i tried full_join and gather, too. it easily solved with unusing 'n'. thanks for your answer! – KNOCK Dec 08 '19 at 16:44

akrun · Answer 3 · 2019-12-08T15:39:20.960

1

Or in base R,

aggregate(cbind(Sum = n) ~ col, rbind(df1, df2), FUN = sum)
#   col Sum
#1   A   3
#2   B   2
#3   C   3
#4   D   1

data

df1 <- structure(list(col = c("A", "B", "C"), n = c(1L, 1L, 2L)), 
    class = "data.frame", row.names = c("1", 
"2", "3"))

df2 <- structure(list(col = c("A", "B", "C", "D"), n = c(2L, 1L, 1L, 
1L)), class = "data.frame", row.names = c("1", "2", "3", "4"))

edited Dec 08 '19 at 15:39

answered Dec 08 '19 at 15:33

akrun

874,273
37
540
662

score 0 · Answer 4 · answered Dec 08 '19 at 15:04

0

data.table is a superior package to dplyr. I suggest you try it:

library(data.table)
dat1 <- setDT(dat1); dat2 <- setDT(dat2)

dat3 <- rbindlist(list(dat1, dat2))[, .(n= sum(n)), .(col)]

answered Dec 08 '19 at 15:04

alexwhitworth

4,839
5
32
59

1

Just a naive question, why `data.table` is superior to `dplyr` ? – dc37 Dec 08 '19 at 15:10
4

Your "superior" reference is completely contextual, and subject to a slew of opinion, experience, needs, etc. Not all comparison factors are based on time-to-compute. One step further: while I am getting more proficient with `data.table`, its readability -- especially for new R users -- can be daunting. Considering that this user seems to be *just starting* with `dplyr`, let's just stick with what they are "familiar" with. – r2evans Dec 08 '19 at 15:10
1

@r2evans I agree with you second point. But the superiority of data.table to dplyr is so well documented at this point that it is, IMO, reasonable to consider it a fact and not an opinion – alexwhitworth Dec 08 '19 at 15:16
1

@dc37 https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping http://dirk.eddelbuettel.com/blog/2018/01/21/ https://github.com/matloff/TidyverseSkeptic – alexwhitworth Dec 08 '19 at 15:18
2

Again, superiority is relative. If you mean *faster*, yes. If you mean *memory-efficient*, certainly. (And I agree whole-heartedly on both counts.) But it has also been argued many times that the conciseness of it is both a strength and a weakness, and please acknowledge that its syntax is enough at odds with base R (and other packages) to be confusing *to new R users*. It is the right tool for a lot of problems, but it is not the perfect tool for all problems. (Nothing fits that bill.) Nice set of links, btw, I only had two in my recent history :-) – r2evans Dec 08 '19 at 15:23
2

@AlexW You don't need `dat1 <- setDT(dat1)`, doing `setDT(dat1)` is enough. – markus Dec 08 '19 at 15:24
2

@r2evans Happy for us to take this chat elsewhere so that we don't distract from answering the OP's question. Yes, many people think the readability of `dplyr` is a strong value-add. Others disagree... On syntax, both `data.table` and `dplyr` have different syntax than base R. For instance, the use of piping... This discussion is getting far off topic. Let's either continue in chat or drop it. Cheers – alexwhitworth Dec 08 '19 at 15:27
@AlexW can u check this please https://stackoverflow.com/questions/59233401/in-r-how-to-create-multilevel-radiogroupbuttons-as-each-level-depends-choicena?noredirect=1#comment104681894_59233401 – John Smith Dec 08 '19 at 18:40

R: How can i merge more 2 data frames with adding values?

4 Answers4

data