0

I have a csv with multiple columns, with names like XX_0, XXX_1,...XXX_20

I want to add a new column to each row of the csv that is weighted sum of these columns. E.g DataFrame$WeightedX = 1*DataFRame$X_0+2*DataFRame$X_1+....+21**DataFRame$X_21

Is there a way to do this programmatically?

The code under manual works. I want to do something along the lines of the programmatic attempt below

Manual version works

findWeightedSumManual <- function(x){
weights <- c(1,10,20)
return(weights[1]*x$X_0 + weights[2]*x$X_1 + weights[3]*x$X_2)
}

Data = data.table(1:3,2:4,5:7)
colnames(Data) <- c('X_0', 'X_1', 'X_2')
Data$WeightedSum1 <- findWeightedSumManual(Data)

But I want to make it programmatic along the lines of

findWeightedSum <- function(x){
  weights <- c(1,10,20)
  sum <- 0

  for (i in 1:ncol(x)) {
    colName <- paste("X_",i,sep="")
    sum <- sum + weights[i]*x$colName
  }

  return(sum)
}

Data = data.table(1:3,2:4,5:7)
colnames(Data) <- c('X_0', 'X_1', 'X_2')
Data$WeightedSum2 <- findWeightedSum(Data)
Data
  • 1
    Don't use `$`. Please see [this answer I wrote earlier today](http://stackoverflow.com/a/39131241/903061), and my suggested duplicate: [Select a data frame column using $ and the name of the column as a string in a variable](http://stackoverflow.com/q/18222286/903061) – Gregor Thomas Aug 24 '16 at 22:34
  • Also, if all your columns are numeric then use matrix multiplication. Your function is `Data$WeightedSum1 = Data %*% weights`. – Gregor Thomas Aug 24 '16 at 22:37
  • Thanks, @Gregor. How does the matrix multiplication work if I have other columns in Data? – Arvind Swaminathan Aug 24 '16 at 23:03
  • @Gregor Getting following error Code: `weights <- c(1,10,20) Data$WeightedSum2 <- Data %*% weights` `Error in Data %*% weights : requires numeric/complex matrix/vector arguments Execution halted` – Arvind Swaminathan Aug 24 '16 at 23:09
  • Right - you would need to convert to a matrix to use matrix multiplication. Didn't include that. `as.matrix(Data) %*% weights`. Whether you want to convert to matrix in advance and keep it as a matrix or just use matrix multiplication for this one step depends on context. – Gregor Thomas Aug 24 '16 at 23:14
  • @Gregor One last question. If my csv has many other columns, how do I get a matrix from the pruned version of the DataFrame that only has columns of interest. This is what I tried `Data = data.table(1:3,2:4,5:7,8:10) colnames(Data) <- c('X_0', 'X_1', 'X_2', 'Bogus') myvars <- c('X_0', 'X_1', 'X_2') PrunedData <- Data[myvars] Data$WeightedSum2 <- as.matrix(PrunedData) %*% weights` but am getting error below – Arvind Swaminathan Aug 24 '16 at 23:56
  • Error is `Error in `[.data.table`(Data, myvars) : When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey. Calls: [ -> [.data.table` – Arvind Swaminathan Aug 24 '16 at 23:57
  • [This issue seems to be in play](http://stackoverflow.com/q/15007979/903061). I'm not a data.table expert, but I think this works `setkey(Data); Data[, WeightedSum2 := as.matrix(Data[, myvars, with = F]) %*% weights]`. – Gregor Thomas Aug 25 '16 at 00:15

0 Answers0