My question is similar to other questions that have been asked, such as this: Sum of rows based on column-value
Using data.table
in R, if I want to sum values in a column based on unique values in another column, I can do the following:
mre = data.table(
ref=rep(LETTERS[1:5],3),
qual=c('B','C','D','E','A','C','D','E','A','B','D','E','A','B','C'),
score=10:24
)
mre[,team_score:=sum(score),by=ref]
Or:
mre[,team_val:=sum(score),by=qual]
But how would I do both, simultaneously? That is, Add all elements in mre[,score]
for which mre[,ref]
or mre[,qual]
equal one of the five unique values (A through E). So, for A, the element in the new column would be the sum of all rows in score for which A occurs in either mre[,ref]
or mre[,qual]
(1,5,6,9,11,13).
I can do:
mre[ref=="A"|qual=="A",team_val:=sum(score)]
but that's terribly inefficient. Surely there is a better/smarter data.table
way to do this?
EDIT: Clarifying desired output, a better example
Rather than creating a new column and calculating sums by reference, which would lead to complications involving multiple values in the ref
and qual
columns, I can do this:
test=
merge(setkey(setnames(mre[,sum(score),by=ref],"ref","test_val"),test_val),
setkey(setnames(mre[,sum(score),by=qual],"qual","test_val"),test_val),
all=T)[,team_val:=rowSums(.SD,na.rm=T),.SDcols=c("V1.x","V1.y")]
which produces:
> test
test_val V1.x V1.y team_val
1: A 45 54 99
2: B 48 52 100
3: C 51 50 101
4: D 54 48 102
5: E 57 51 108
I'm hoping to do this in a way that doesn't require a merge. I only need the team_val
column, which I immediately use in a multiplication expression with a column from another data.table
(so I could just do:
team_val =
merge(setkey(setnames(mre[,sum(score),by=ref],"ref","test_val"),test_val),
setkey(setnames(mre[,sum(score),by=qual],"qual","test_val"),test_val),
all=T)[,rowSums(.SD,na.rm=T),.SDcols=c("V1.x","V1.y")]
to get:
> team_val
[1] 99 100 101 102 108
but that still requires the merge). I'm struggling to incorporate chaining (i.e., Section 2b).