Weighted means for all columns in R data.frame

Question

I have a 32x43 data.frame called "allg2", and I recreated a small portion of it here as 5x5 for simplicity:

gneiss mylonite syenite sedimentary Catg
0      3        4       0           -105.7
2      90       1       0           -99.7
15     51       0       0           -95.25
6      0        0       0           -90.5
0      3        9       0           -85.45

As requested, a sample calculation: The 'gneiss' column would be wm=(0/21*-105.7)+(2/21*-99.7)+(15/21*-95.25)+(6/21*-90.5)+(0/21*-85.45)

I would like a weighted mean for each column (with the values of interest in Catg, and each column as the weights for that column), but each solution to this that I can find relies on coding in all of the column names. Is it possible to do this without such a list? Note: I just realized that I have been flipping the weights and values to weigh the entire time. My attempts:

wm=allg2[,lapply(.SD,weighted.mean,w=Catg),by=list(allg2[1,])]
Error: unused argument (by = list(allg2[1, ]))

I found this idea from this thread, and tried to adapt it to my situation. Is it not selecting the column names because they're not a true row? I don't really know what this is doing, and I tried temoving the by= part, which gives the error

 Error in lapply(.SD, weighted.mean, w = Catg) : object '.SD' not found

Another attempt was based on this thread. "Catg" is in the 43rd column, so I tried organizing the line as such:

wm=apply(allg2, 2, function(x) weighted.mean(x[,43], x[,1:42]))
Error in x[, 43] : incorrect number of dimensions

I really don't understand this error, because my column of weights should be in [,43].

I have also tried:

mallg=data.matrix(allg2)
wm=colWeightedMeans(mallg,allg2$Catg)
Error in colWeightedMeans.matrix(mallg, allg2$Catg) : Argument 'w' has negative weights.

I'm really at a loss here. Am I making some small error or am I going about this the completely wrong way?

You've edited the question, it's too confusing. Could you please do a weighted mean by hand so that we can understand your problem? — juliohm, Nov 30 '13 at 17:00
Gladly, sorry for any confusion. The 'gneiss' column would be wm=(0/21*-105.7)+(2/21*-99.7)+(15/21*-95.25)+(6/21*-90.5)+(0/21*-85.45) — Riebeckite, Nov 30 '13 at 17:04
You should realize that dataframes are NOT the same as `data.table` objects. You are using `data.table` code on dataframes in your first erroneous attempt, and that's just not the way to succeed. — IRTFM, Nov 30 '13 at 17:05
If 21 is the sum of elements in Catg, then my answer is still valid. — juliohm, Nov 30 '13 at 17:11

score 3 · Accepted Answer · edited Nov 30 '13 at 17:16

3

Assuming that your weights are in the last column:

ll <- lapply(df[ , -ncol(df)], weighted.mean,  w = df$Catg)
ll
# $gneiss
# [1] 4.555497
# 
# $mylonite
# [1] 30.22283
# 
# $syenite
# [1] 2.709924
# 
# $sedimentary
# [1] 0

Edit: following your comment, you now need to do:

lapply(df[ , -ncol(df)], weighted.mean, x = df$Catg)

edited Nov 30 '13 at 17:16

flodel

87,577
21
185
223

answered Nov 30 '13 at 16:44

Henrik

65,555
14
143
159

I just realized that I asked the wrong question; I had it flipped. The weights are in each respective column, and the values of interest are only in the final column. I will edit the original question, but how would I adapt this for that situation? – Riebeckite Nov 30 '13 at 16:56
1

`lapply(df[ , -ncol(df)], function(x) weighted.mean(df$Catg, w = x))` – IRTFM Nov 30 '13 at 17:04
Just make a minimal example that captures all the relevant characteristics of your data - not more, not less - e.g. two minerals, with columns for raw values and weights, in the relavant order. Then we'll see how we update the answers accordingly. – Henrik Nov 30 '13 at 17:06
@DWin, this worked perfectly! Thank you! Edit: I also just realized that you solved my last R problem. – Riebeckite Nov 30 '13 at 17:13

TheComeOnMan · Answer 2 · 2013-11-30T17:06:48.083

0

dt[,lapply(.SD,weighted.mean,w=Catg)]
apply(dt, 2, function(col) weighted.mean(x = col, w = dt[,Catg]))

I think you need to understand the arguments to each function better.

Update after OP changed question to weights being across columns and value being in Catg - dt[,lapply(.SD,weighted.mean,x=Catg)]; apply(dt, 2, function(col) weighted.mean(w = col, x = dt[,Catg]))

edited Nov 30 '13 at 17:06

answered Nov 30 '13 at 16:46

TheComeOnMan

12,535
8
39
54

I am new to R so I'm still struggling with the terminology. Also, I realized I asked the wrong question and have updated it as such: the values of interest are in the final column, and the respective weights are in the other columns. – Riebeckite Nov 30 '13 at 16:58
Just flipped the arguments in the constructs :) – TheComeOnMan Nov 30 '13 at 17:05
This looks promising. However, I am getting Error in lapply(.SD, weighted.mean, w = Catg) : object '.SD' not found – Riebeckite Nov 30 '13 at 17:11
1

I do not think the OP understands that `dataframes != data.tables`. – IRTFM Nov 30 '13 at 17:13
1

Thanks @DWin :). OP, welcome to R. You might want to do `install.packages('data.table'); library(data.table); dt <- data.table(dt)` before trying out these constructs. Data tables are not the same as data frames, data tables are much more efficient with memory and speed in comparison to data frames. I would highly recommend you look up data tables on the internet and try and use them. – TheComeOnMan Nov 30 '13 at 17:34

juliohm · Answer 3 · 2013-11-30T17:18:07.647

0

I'm new to R, but why not:

sapply(allg2[,-ncol(allg2)], weighted.mean, allg2$Catg)

edited Nov 30 '13 at 17:18

answered Nov 30 '13 at 16:55

juliohm

3,691
2
18
22

Weighted means for all columns in R data.frame

3 Answers3