2

I got two data frames like this:

dat1
  col   n
1  A    1
2  B    1
3  C    2


dat2
  col   n
1  A    2
2  B    1
3  C    1
4  D    1

and I want to make a data frame like this with dat1 and dat2:

dat3
  col   n
1  A    3
2  B    2
3  C    3
4  D    1

I'm trying to make data frame (dat3) with dplyr bind_rows, group_by and count, but I can't.

bind_rows(dat1, dat2) %>%
  group_by(col)

result:
  col   n 
1  A    1
2  B    1
3  C    2
4  A    2
5  B    1
6  C    1
7  D    1

bind_rows(dat1, dat2) %>%
  group_by(col) %>%
  count(n)

result:
  col   n   nn
1  A    1    1
2  A    2    1
3  B    1    2
4  C    1    1
5  C    2    1
6  D    1    1

How can I make dat3?

KNOCK
  • 33
  • 5

4 Answers4

1

You should summarise instead of counting:

bind_rows(dat1, dat2) %>%
  group_by(col) %>% summarise(Sum = sum(n))

# A tibble: 4 x 2
  col     Sum
  <chr> <dbl>
1 A         3
2 B         2
3 C         3
4 D         1
dc37
  • 15,840
  • 4
  • 15
  • 32
1

Third option, just in case:

psum <- function(..., na.rm = TRUE) {
  m <- cbind(...)
  apply(m, 1, sum, na.rm = na.rm)
}

full_join(dat1, dat2, by = "col") %>%
  mutate(n = psum(n.x, n.y))
#   col n.x n.y n
# 1   A   1   2 3
# 2   B   1   1 2
# 3   C   2   1 3
# 4   D  NA   1 1

(n.x and n.y columns are generated by the join due to same-named columns, they are retained here solely for demonstration. Yes, psum is a hack here, likely something better out there ...)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • i tried full_join and gather, too. it easily solved with unusing 'n'. thanks for your answer! – KNOCK Dec 08 '19 at 16:44
1

Or in base R,

aggregate(cbind(Sum = n) ~ col, rbind(df1, df2), FUN = sum)
#   col Sum
#1   A   3
#2   B   2
#3   C   3
#4   D   1

data

df1 <- structure(list(col = c("A", "B", "C"), n = c(1L, 1L, 2L)), 
    class = "data.frame", row.names = c("1", 
"2", "3"))

df2 <- structure(list(col = c("A", "B", "C", "D"), n = c(2L, 1L, 1L, 
1L)), class = "data.frame", row.names = c("1", "2", "3", "4"))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

data.table is a superior package to dplyr. I suggest you try it:

library(data.table)
dat1 <- setDT(dat1); dat2 <- setDT(dat2)

dat3 <- rbindlist(list(dat1, dat2))[, .(n= sum(n)), .(col)]
alexwhitworth
  • 4,839
  • 5
  • 32
  • 59
  • 1
    Just a naive question, why `data.table` is superior to `dplyr` ? – dc37 Dec 08 '19 at 15:10
  • 4
    Your "superior" reference is completely contextual, and subject to a slew of opinion, experience, needs, etc. Not all comparison factors are based on time-to-compute. One step further: while I am getting more proficient with `data.table`, its readability -- especially for new R users -- can be daunting. Considering that this user seems to be *just starting* with `dplyr`, let's just stick with what they are "familiar" with. – r2evans Dec 08 '19 at 15:10
  • 1
    @r2evans I agree with you second point. But the superiority of data.table to dplyr is so well documented at this point that it is, IMO, reasonable to consider it a fact and not an opinion – alexwhitworth Dec 08 '19 at 15:16
  • 1
    @dc37 https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping http://dirk.eddelbuettel.com/blog/2018/01/21/ https://github.com/matloff/TidyverseSkeptic – alexwhitworth Dec 08 '19 at 15:18
  • 2
    Again, superiority is relative. If you mean *faster*, yes. If you mean *memory-efficient*, certainly. (And I agree whole-heartedly on both counts.) But it has also been argued many times that the conciseness of it is both a strength and a weakness, and please acknowledge that its syntax is enough at odds with base R (and other packages) to be confusing *to new R users*. It is the right tool for a lot of problems, but it is not the perfect tool for all problems. (Nothing fits that bill.) Nice set of links, btw, I only had two in my recent history :-) – r2evans Dec 08 '19 at 15:23
  • 2
    @AlexW You don't need `dat1 <- setDT(dat1)`, doing `setDT(dat1)` is enough. – markus Dec 08 '19 at 15:24
  • 2
    @r2evans Happy for us to take this chat elsewhere so that we don't distract from answering the OP's question. Yes, many people think the readability of `dplyr` is a strong value-add. Others disagree... On syntax, both `data.table` and `dplyr` have different syntax than base R. For instance, the use of piping... This discussion is getting far off topic. Let's either continue in chat or drop it. Cheers – alexwhitworth Dec 08 '19 at 15:27
  • @AlexW can u check this please https://stackoverflow.com/questions/59233401/in-r-how-to-create-multilevel-radiogroupbuttons-as-each-level-depends-choicena?noredirect=1#comment104681894_59233401 – John Smith Dec 08 '19 at 18:40