2

i have 2 Tables with 2 columns each:

ID     Topic  
123     A  
124     B  
125     A  
126     C    

and:

ID     Freq  
123     20  
124     30   
125     10  
123     50  

the Result should be each topic with related sum of freq, like this:

Topic   Total  
  A      80  
  B      30  
  C       0  

how do i get this with R (working with RStudio) also the tables are really long ~ 50k lines

Sotos
  • 51,121
  • 6
  • 32
  • 66
maxey
  • 33
  • 3
  • 1
    what you have tried? 50k is not a big table anymore given RAM is much larger nowadays. – Sixiang.Hu Feb 21 '18 at 15:42
  • 1
    [Merge](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) and [sum per group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) – Sotos Feb 21 '18 at 15:47
  • @Sotos You can also check [here](https://stackoverflow.com/questions/5769320/how-to-merge-two-data-frames-on-common-columns-in-r-with-sum-of-others) – akrun Feb 21 '18 at 16:34

2 Answers2

1

In base R:

df3 <- merge(df1,df2,all.x=TRUE)
df3[is.na(df3)] <- 0
aggregate(Freq ~ Topic,df3,sum)
#   Topic Freq
# 1     A   80
# 2     B   30
# 3     C    0

data

df1 <- read.table(text="ID     Topic  
123     A  
124     B  
125     A  
126     C",h=T,strin=F)    

df2 <- read.table(text="ID     Freq  
123     20  
124     30   
125     10  
123     50",h=T,strin=F)  
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
1

The tidyverse approach :

library(dplyr)
df1 %>%
  left_join(df2, by = "ID") %>%
  group_by(Topic) %>%
  summarize(Total = sum(Freq, na.rm = TRUE))
Julien Navarre
  • 7,653
  • 3
  • 42
  • 69