61

I have several records with a given attribute, and I want to find the standard deviation.

How do I do that?

Satchel
  • 16,414
  • 23
  • 106
  • 192

10 Answers10

102
module Enumerable

    def sum
      self.inject(0){|accum, i| accum + i }
    end

    def mean
      self.sum/self.length.to_f
    end

    def sample_variance
      m = self.mean
      sum = self.inject(0){|accum, i| accum +(i-m)**2 }
      sum/(self.length - 1).to_f
    end

    def standard_deviation
      Math.sqrt(self.sample_variance)
    end

end 

Testing it:

a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
a.standard_deviation  
# => 4.594682917363407

01/17/2012:

fixing "sample_variance" thanks to Dave Sag

Community
  • 1
  • 1
tolitius
  • 22,149
  • 6
  • 70
  • 81
  • 2
    there is an error in your `sample_variance` method. See my answer below. – Dave Sag Nov 24 '11 at 02:08
  • This code snippet is wrong. It does not return the correct STDEV. Please fix! – Steve Jan 17 '12 at 17:33
  • 18
    You don't need to write "self" or "return" so much in Ruby. – David Grayson Aug 11 '12 at 04:06
  • While this is fine, I prefer an approach without adding methods to Enumerable. I added my answer as well based on this: http://stackoverflow.com/a/21143604/90691 – marcgg Jan 15 '14 at 17:00
  • 3
    In the line `sum/(self.length - 1).to_f` why are you subtracting 1 from the length of the Enumerable? – Cam Mar 30 '14 at 06:22
  • Also worth nothing this returns NaN for variance & std dev when the array’s length == 1. To fix, add `return 0.0 if a.length == 1` to the start of `sample_variance`. – Benji XVI Apr 28 '14 at 19:10
  • 3
    I think sum/(self.length - 1).to_f should be sum/length, I don't think the -1 is necessary and causes issues. – moger777 Jul 18 '14 at 17:36
  • There were a bunch of little things about this code snippet that bothered me so I cleaned it up and posted my version here: https://gist.github.com/DavidEGrayson/e64962e8281ac8b1d637 – David Grayson Oct 03 '14 at 21:35
  • 1
    while I like it, I don't do Ruby, so take it as a "non Rubyist" answer. but why link outside vs. just contribute here on SO by editing the answer? – tolitius Oct 04 '14 at 14:15
  • 3
    @moger777 The code is doing a sample standard deviation, not a population standard deviation, so the (n-1) is correct: http://www.macroption.com/population-sample-variance-standard-deviation/ – Ryan McCuaig Nov 29 '14 at 19:41
  • Sexiest sum ever : `def sum ; self.inject(:+) ; end` – Erowlin Feb 28 '17 at 17:35
  • Even more sexier, in Ruby2.4 there is now an Enumerable#sum, so we can use [1,2,3,4].sum – rtfminc Dec 07 '17 at 02:21
37

It appears that Angela may have been wanting an existing library. After playing with statsample, array-statisics, and a few others, I'd recommend the descriptive_statistics gem if you're trying to avoid reinventing the wheel.

gem install descriptive_statistics
$ irb
1.9.2 :001 > require 'descriptive_statistics'
 => true 
1.9.2 :002 > samples = [1, 2, 2.2, 2.3, 4, 5]
 => [1, 2, 2.2, 2.3, 4, 5] 
1.9.2p290 :003 > samples.sum
 => 16.5 
1.9.2 :004 > samples.mean
 => 2.75 
1.9.2 :005 > samples.variance
 => 1.7924999999999998 
1.9.2 :006 > samples.standard_deviation
 => 1.3388427838995882 

I can't speak to its statistical correctness, or your comfort with monkey-patching Enumerable; but it's easy to use and easy to contribute to.

Simon B.
  • 2,530
  • 24
  • 30
eprothro
  • 1,087
  • 11
  • 16
  • This is exactly the quick solution I was looking for. I don't know enough about statistics to check the work, but for anyone who just needs to get some basic stat math with minimal effort its a win. – genkilabs Dec 20 '12 at 21:46
  • 4
    Important note for Rails users. At this time, the descriptive_statistics gem appears to break ActiveRecord::Relation - you'll run into `NoMethodError: undefined method `zero?' for nil:NilClass` and `(Object doesn't support #inspect)`. – MrTheWalrus Dec 02 '13 at 19:22
  • I also couldn't use the descriptive_statistics gem because it uses the length method rather than size or count to get the size of an enumerable object, but some common enumerables like Vector don't implement length. – Ben Wheeler Oct 26 '15 at 22:46
31

The answer given above is elegant but has a slight error in it. Not being a stats head myself I sat up and read in detail a number of websites and found this one gave the most comprehensible explanation of how to derive a standard deviation. http://sonia.hubpages.com/hub/stddev

The error in the answer above is in the sample_variance method.

Here is my corrected version, along with a simple unit test that shows it works.

in ./lib/enumerable/standard_deviation.rb

#!usr/bin/ruby

module Enumerable

  def sum
    return self.inject(0){|accum, i| accum + i }
  end

  def mean
    return self.sum / self.length.to_f
  end

  def sample_variance
    m = self.mean
    sum = self.inject(0){|accum, i| accum + (i - m) ** 2 }
    return sum / (self.length - 1).to_f
  end

  def standard_deviation
    return Math.sqrt(self.sample_variance)
  end

end

in ./test using numbers derived from a simple spreadsheet.

Screen Snapshot of a Numbers spreadsheet with example data

#!usr/bin/ruby

require 'enumerable/standard_deviation'

class StandardDeviationTest < Test::Unit::TestCase

  THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5]

  def test_sum
    expected = 16.5
    result = THE_NUMBERS.sum
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_mean
    expected = 2.75
    result = THE_NUMBERS.mean
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_sample_variance
    expected = 2.151
    result = THE_NUMBERS.sample_variance
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_standard_deviation
    expected = 1.4666287874
    result = THE_NUMBERS.standard_deviation
    assert result.round(10) == expected, "expected #{expected} but got #{result}"
  end

end
Dave Sag
  • 13,266
  • 14
  • 86
  • 134
  • For ruby 1.8.7, I changed the last assert to `assert result - expected < 1e-10`, added `require test/unit` and changed the first require to `require 'enumerable'. – jtpereyda Sep 25 '13 at 16:53
  • I copied this code into my console and am getting 1.3388427838995882 as the standard deviation of the given array....??? – sixty4bit Jul 17 '15 at 15:15
10

I'm not a big fan of adding methods to Enumerable since there could be unwanted side effects. It also gives methods really specific to an array of numbers to any class inheriting from Enumerable, which doesn't make sense in most cases.

While this is fine for tests, scripts or small apps, it's risky for larger applications, so here's an alternative based on @tolitius' answer which was already perfect. This is more for reference than anything else:

module MyApp::Maths
  def self.sum(a)
    a.inject(0){ |accum, i| accum + i }
  end

  def self.mean(a)
    sum(a) / a.length.to_f
  end

  def self.sample_variance(a)
    m = mean(a)
    sum = a.inject(0){ |accum, i| accum + (i - m) ** 2 }
    sum / (a.length - 1).to_f
  end

  def self.standard_deviation(a)
    Math.sqrt(sample_variance(a))
  end
end

And then you use it as such:

2.0.0p353 > MyApp::Maths.standard_deviation([1,2,3,4,5])
=> 1.5811388300841898

2.0.0p353 :007 > a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
 => [20, 23, 23, 24, 25, 22, 12, 21, 29]

2.0.0p353 :008 > MyApp::Maths.standard_deviation(a)
 => 4.594682917363407

2.0.0p353 :043 > MyApp::Maths.standard_deviation([1,2,2.2,2.3,4,5])
 => 1.466628787389638

The behavior is the same, but it avoids the overheads and risks of adding methods to Enumerable.

marcgg
  • 65,020
  • 52
  • 178
  • 231
2

The presented computation are not very efficient because they require several (at least two, but often three because you usually want to present average in addition to std-dev) passes through the array.

I know Ruby is not the place to look for efficiency, but here is my implementation that computes average and standard deviation with a single pass over the list values:

module Enumerable

  def avg_stddev
    return nil unless count > 0
    return [ first, 0 ] if count == 1
    sx = sx2 = 0
    each do |x|
      sx2 += x**2
      sx += x
    end
    [ 
      sx.to_f  / count,
      Math.sqrt( # http://wijmo.com/docs/spreadjs/STDEV.html
        (sx2 - sx**2.0/count)
        / 
        (count - 1)
      )
    ]
  end

end
Guss
  • 30,470
  • 17
  • 104
  • 128
  • This is the difference between serving 100 people and 1000 people. If you got rid of the inject and the array inside its block, it could be 10.000 people :-) – nurettin Aug 29 '15 at 11:47
  • Something like this? I'm not familiar with the inefficiencies of `inject` and also I'm not sure what you have against the array except that it creates n objects - but these are short lived objects and shouldn't be a huge resource drain. – Guss Aug 29 '15 at 21:01
  • in my experience removing short lived objects from code increased performance a huge deal and it's the first place I look when I reach a bottleneck. It might be because of the heap allocation times of JVM (because I use JRuby most of the time) – nurettin Aug 30 '15 at 05:41
2

As a simple function, given a list of numbers:

def standard_deviation(list)
  mean = list.inject(:+) / list.length.to_f
  var_sum = list.map{|n| (n-mean)**2}.inject(:+).to_f
  sample_variance = var_sum / (list.length - 1)
  Math.sqrt(sample_variance)
end
tothemario
  • 5,851
  • 3
  • 44
  • 39
1

If the records at hand are of type Integer or Rational, you may want to compute the variance using Rational instead of Float to avoid errors introduced by rounding.

For example:

def variance(list)
  mean = list.reduce(:+)/list.length.to_r
  sum_of_squared_differences = list.map { |i| (i - mean)**2 }.reduce(:+)
  sum_of_squared_differences/list.length
end

(It would be prudent to add special-case handling for empty lists and other edge cases.)

Then the square root can be defined as:

def std_dev(list)
  Math.sqrt(variance(list))
end
Peter Kagey
  • 293
  • 2
  • 7
0

In case people are using postgres ... it provides aggregate functions for stddev_pop and stddev_samp - postgresql aggregate functions

stddev (equiv of stddev_samp) available since at least postgres 7.1, since 8.2 both samp and pop are provided.

Straff
  • 5,499
  • 4
  • 33
  • 31
0

Or how about:

class Stats
    def initialize( a )
        @avg = a.count > 0 ? a.sum / a.count.to_f : 0.0
        @stdev = a.count > 0 ? ( a.reduce(0){ |sum, v| sum + (@avg - v) ** 2 } / a.count ) ** 0.5 : 0.0
    end
end
0

You can place this as helper method and assess it everywhere.

def calc_standard_deviation(arr)
    mean = arr.sum(0.0) / arr.size
    sum = arr.sum(0.0) { |element| (element - mean) ** 2 }
    variance = sum / (arr.size - 1)
    standard_deviation = Math.sqrt(variance)
end
Roshan
  • 905
  • 9
  • 21