0

I have an array of floating point data, I would like to pick out the most probable value. It is called "mode" in descriptive statistics. How can I calculate it in Ruby, or with the help of a gem.

Konstantin
  • 2,983
  • 3
  • 33
  • 55
  • 1
    possible duplicate of [Ruby: How to find item in array which has the most occurrences?](http://stackoverflow.com/questions/412169/ruby-how-to-find-item-in-array-which-has-the-most-occurrences) – theTRON Jun 26 '14 at 01:32
  • Thx, but I think those algorithm is useless with floating point data. – Konstantin Jun 26 '14 at 01:51
  • 1
    @Konstantin, why you think so? That answer works perfectly for float. There is nothing wrong to use float as key of Hash in Ruby. – huocp Jun 26 '14 at 02:12
  • @theTRON is correct, the method in the first answer will work for you. – Anthony Jun 26 '14 at 02:14

3 Answers3

1
[0.0, 0.1, 0.2, 0.1, 0.3, 0.3, 0.1]
.group_by{|e| e}.max_by{|k, v| v.length}.first
# => 0.1
sawa
  • 165,429
  • 45
  • 277
  • 381
1

DescriptiveStatistics adds methods to the Enumerable module to allow easy calculation of basic descriptive statistics of Numeric sample data in collections that have included Enumerable such as Array, Hash, Set, and Range.

> require 'descriptive_statistics'
> [0.0, 0.1, 0.2, 0.1, 0.3, 0.3, 0.1].mode
=> 0.1
0

The following will work for bimodal and multimodal datasets, but only returns a single value. For bimodal/multimodal datasets it always returns the value that occurs first in the array.

# returns 1.0
a = [1.0, 1.0, 2.0, 2.0, 3.0]
a.max_by { |x| a.count(x) }

You can also try the easystats gem. It adds a .mode method to Arrays (among other methods), but it returns nil for bimodal or multimodal datasets.

require 'easystats'

# returns 1.0
a = [1.0, 1.0, 2.0, 3.0]
a.mode 

# returns nil
a = [1.0, 1.0, 2.0, 2.0, 3.0]
a.mode
infused
  • 24,000
  • 13
  • 68
  • 78
  • 1
    Your first piece of code will work, but is inefficient. – sawa Jun 26 '14 at 02:40
  • This is true. The fastest method appears to be `a.group_by {|e| e}.values.max_by{|e| e.size}.first`, which was posted by @Brandon in the duplicate post mentioned above. – infused Jun 26 '14 at 03:50
  • Thx, I see, but my floating point numbers show a bit fluctuation, because they comes from different calculations. For example 1.00001 and 1.00000 should be both treated as 1.0 – Konstantin Jun 26 '14 at 18:16
  • On the top of this, my floating point numbers are in pairs, because they are parameters of a line in a coordinate system (y=a*x+b). In fact my data is two dimensional, so something advanced method should be applied. I don't think so I am allowed to calculate the mode of the "a" values and separately of the "b" values, because they are "attached". – Konstantin Jun 26 '14 at 18:26
  • In that case, use [Float#round](http://www.ruby-doc.org/core-2.1.2/Float.html#method-i-round) to round each value to a specific precision: `rounded = a.map {|n| n.round(1)}` – infused Jun 26 '14 at 18:26
  • @Konstantin, it might help if you can update your question with some example data. – infused Jun 26 '14 at 18:29
  • Okay, here is my sample data http://pastebin.com/krAh1yUC Floating point number pairs represents parameters of a linear transformation: y=a*x+b So values in the array are [a,b] pairs, 45 pieces of pairs in total. One can see the mode is a=1.0, b=0.4 – Konstantin Jun 26 '14 at 21:52