Summing the products of multiple variables per row

Question

I have a data.table as follows:

library(data.table)
set.seed(1)
DT <- data.table(panelID = sample(50,50),                                                    # Creates a panel ID
                      Country = c(rep("Albania",30),rep("Belarus",50), rep("Chilipepper",20)),       
                      some_NA = sample(0:5, 6),                                             
                      some_NA_factor = sample(0:5, 6),         
                      Group = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                      Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
                      norm = round(runif(100)/10,2),
                      Income = sample(0:5, 6),
                      Happiness = sample(10,10),
                      Sex = round(rnorm(10,0.75,0.3),2),
                      Age = sample(100,100),
                      Educ = round(rnorm(10,0.75,0.3),2))           
DT [, uniqueID := .I]                                                                        # Creates a unique ID     
DT[DT == 0] <- NA                                                                            # https://stackoverflow.com/questions/11036989/replace-all-0-values-to-na
DT$some_NA_factor <- factor(DT$some_NA_factor)

Now, I would like to (for some artificial reason) sum the products of income & education and Sex & Age, for each observation using data.table. Please not that my actual data has way more variables, of which some are NA's. I tried:

DT<- setDT(DT)[, newvar:= sum((Income *Educ),
   (Sex * Age), na.rm=TRUE)]

But that takes the sum of the columns. I also tried:

DT<- setDT(DT)[, newvar:= rowSums((Income *Educ),
   (Sex * Age), na.rm=TRUE)]

But that does not work:

Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 
  'x' must be an array of at least two dimensions

What would be the correct way to do this in data.table?

See also https://stackoverflow.com/questions/31258547/data-table-row-wise-sum-mean-min-max-like-dplyr — Cole, Oct 11 '19 at 11:41

s_baldur · Accepted Answer · 2019-10-11T11:46:00.327

4

DT[, newvar := rowSums(data.table(Income*Educ, Sex * Age), na.rm=TRUE)]

# ALternatively:
DT[, newvar := {x = Income*Educ; y = Sex * Age; fifelse(is.na(x), y, fifelse(is.na(y), x, x + y ))}]

Note:

setDT() is only necessary if data.frame is not a data.table yet. <- (assigning the result is not needed when you use := within the data.table.

edited Oct 11 '19 at 11:46

answered Oct 11 '19 at 11:38

s_baldur

29,441
4
36
69

I installed via CRAN few days ago `data.table_1.12.4`. Also see their news (item 21) file on github: https://github.com/Rdatatable/data.table/blob/master/NEWS.md – s_baldur Oct 11 '19 at 16:16
see also here: https://www.rdocumentation.org/packages/data.table/versions/1.12.4/topics/fifelse – s_baldur Oct 11 '19 at 16:17
@akrun maybe try `install.packages('data.table', repos='http://cran.us.r-project.org')` – s_baldur Oct 11 '19 at 16:56
Ok, thanks, I accidentally used the `no` option earlier to not install from source which is the 1.12.4 – akrun Oct 11 '19 at 16:58

Summing the products of multiple variables per row

1 Answers1