40

I have a time series x_0 ... x_t. I would like to compute the exponentially weighted variance of the data. That is:

V = SUM{w_i*(x_i - x_bar)^2, i=1 to T} where SUM{w_i} = 1 and x_bar=SUM{w_i*x_i}

ref: http://en.wikipedia.org/wiki/Weighted_mean#Weighted_sample_variance

The goal is to basically weight observations that are further back in time less. This is very simple to implement but I would like to use as much built in funcitonality as possible. Does anyone know what this corresponds to in R?

Thanks

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Alex
  • 19,533
  • 37
  • 126
  • 195
  • I'm guessing that this is an incomplete specification and that what you really want delivered will require better specification of how w_i is constructed and more detail on the limits of summation. – IRTFM Apr 07 '12 at 14:19

6 Answers6

38

R provides weighted mean. In fact, ?weighted.mean shows this example:

 ## GPA from Siegel 1994
 wt <- c(5,  5,  4,  1)/15
 x <- c(3.7,3.3,3.5,2.8)
 xm <- weighted.mean(x, wt)

One more step:

v <- sum(wt * (x - xm)^2)
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • Hmisc, as it turned out, does just this. – Alex Apr 08 '12 at 02:26
  • Note the last line in the answer. That is the weighted variance. – Matthew Lundberg Apr 08 '12 at 02:47
  • 9
    Just a hint... If you are thick skulled like me, 15 is the sum of the individual weights. Weights are then normalized in this case. I didn't catch that at first. – tharen Jun 13 '12 at 18:51
  • 1
    you forgot to divide v by n or n-1 – skan Jul 22 '16 at 15:18
  • 2
    @skan The formula above represents the population variance for the set. Note that `sum(wt) == 1` so multiplying by `wt` in the expression is that division. – Matthew Lundberg Jul 22 '16 at 15:28
  • it doesn't produce exactly the same value than the function wt.var(x, wt) from the library SMDtools, maybe they use some correction? – skan Jul 22 '16 at 15:40
  • 2
    It seems this answer currently works best. I think more consistent would be to have `v <- weighted.mean( (x-xm)^2, wt )` Since that also works when weights are not normalized. – Michael Lachmann Sep 06 '20 at 20:04
  • You should also consider whether your weights are frequency or reliability weights. See @Matifou below. – Alex Sep 24 '21 at 18:01
35

The Hmisc package contains the functions you need.

Thus:

x <- c(3.7,3.3,3.5,2.8)

wt <- c(5,  5,  4,  1)/15

xm <- wtd.mean(x, wt)

var <- wtd.var(x, wt)

sd <- sqrt(var)

Unfortunately the author of the Hmisc package did not include an explicit wtd.sd function. You have to square root wtd.var.

Charles Kangai

Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
user3770859
  • 359
  • 3
  • 2
8

I too get errors from Hmisc when using the wtd.var() function. Fortunately, SDMTools has comparable functionality, and even calculates SD directly for you, without needing to take sqrt of variance.

library(SDMTools)

x <- c(3.7,3.3,3.5,2.8)
wt <- c(5,  5,  4,  1)/15  ## Note: no actual need to normalize weights to sum to 1, this will be done automatically.

wt.mean(x, wt)
wt.sd(x,wt)

wt.var(x, wt)
Michael Ohlrogge
  • 10,559
  • 5
  • 48
  • 76
3

Package Hmisc has function wt.var(), as noted by others.

Note that you need to understand whether you want frequency or reliability weights. In your case, I believe you are interested in reliability weights, so will need to set explicitely normwt=TRUE. In that case you can give your weights in any format (sum to 1, sum to N, etc). If you were to use frequency weights, you would need to be careful about how you specify weights.

library(Hmisc)

n <- 3
x <- seq_len(n)
w <- c(0.1, 0.2, 0.6)
w2 <- w / min(w)
w3 <- w / sum(w)

## reliability weights?
wtd.var(x = x, weights = w, normwt=TRUE)
#> [1] 0.95
wtd.var(x = x, weights = w2, normwt=TRUE)
#> [1] 0.95
wtd.var(x = x, weights = w3, normwt=TRUE)
#> [1] 0.95

## frequency weights?
wtd.var(x = x, weights = w)
#> Warning in wtd.var(x = x, weights = w): only one effective observation; variance
#> estimate undefined
#> [1] -4.222222
wtd.var(x = x, weights = w2)
#> [1] 0.5277778
wtd.var(x = x, weights = w3)
#> Warning in wtd.var(x = x, weights = w3): only one effective observation;
#> variance estimate undefined
#> [1] Inf

Created on 2020-08-26 by the reprex package (v0.3.0)

Matifou
  • 7,968
  • 3
  • 47
  • 52
  • The difference between frequency/reliability weights isn't given enough attention. +1 – Alex Sep 24 '21 at 18:00
0

Hmisc package provides this functionality:

http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=Hmisc:wtd.stats

Alex
  • 19,533
  • 37
  • 126
  • 195
0

For the Variance and Standard deviation you have to distinguish between biased estimate and frequency and reliability / sampling weights.

x <- c(1, 4, 5)
wf <- c(1, 2, 3)        # Frequency counts
ws <- c(0.1, 0.2, 0.3)  # Sampling weights

## Weighted mean
mean(rep(x, wf))        # Works only for integer frequencys
#[1] 4
sum(x * wf) / sum(wf)
#[1] 4
sum(x * ws) / sum(ws)
#[1] 4
weighted.mean(x, wf)
#[1] 4
weighted.mean(x, ws)
#[1] 4

## Frequency counts
var(rep(x, wf))                        # Variance
#[1] 2.4
sd(rep(x, wf))                         # Standard deviation
#[1] 1.549193
sw <- sum(wf)
xm <- sum(x * wf) / sw
sum(wf * (x - xm)^2) / (sw - 1)        # Variance
#[1] 2.4
sqrt(sum(wf * (x - xm)^2) / (sw - 1))  # Standard deviation
#[1] 1.549193

## Sampling weights
xm <- weighted.mean(x, ws)
sum(ws *(x-xm)^2)*(sum(ws)/(sum(ws)^2-sum(ws^2)))  # Variance
#[1] 3.272727
cov.wt(matrix(x, ncol=1), ws)$cov                  # Variance
#[1,] 3.272727

## BIASED weighted sample variance
xm <- weighted.mean(x, ws)
sum(ws * (x - xm)^2) / sum(ws)  # Variance
#[1] 2
xm <- weighted.mean(x, wf)
sum(wf * (x - xm)^2) / sum(wf)  # Variance
#[1] 2

## Using Hmisc
Hmisc::wtd.var(x, wf)
#[1] 2.4
Hmisc::wtd.var(x, ws, normwt=TRUE)
#[1] 3.272727
GKi
  • 37,245
  • 2
  • 26
  • 48