43

It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers:

http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473

However, I can not find it anywhere in current versions of scipy. Of course it is possible to just copy the old code from repository but I prefer to use scipy's version. Where can I find it, or has it been replaced or removed?

Manav Kataria
  • 5,060
  • 4
  • 24
  • 26
Ton van den Heuvel
  • 10,157
  • 6
  • 43
  • 82

10 Answers10

60

[EDIT] Since this keeps on getting downvoted: I know that median absolute deviation is a more commonly-used statistic, but the questioner asked for mean absolute deviation, and here's how to do it:

from numpy import mean, absolute

def mad(data, axis=None):
    return mean(absolute(data - mean(data, axis)), axis)
mhsmith
  • 6,675
  • 3
  • 41
  • 58
36

For what its worth, I use this for MAD:

def mad(arr):
    """ Median Absolute Deviation: a "Robust" version of standard deviation.
        Indices variabililty of the sample.
        https://en.wikipedia.org/wiki/Median_absolute_deviation 
    """
    arr = np.ma.array(arr).compressed() # should be faster to not use masked arrays.
    med = np.median(arr)
    return np.median(np.abs(arr - med))
Lee
  • 640
  • 7
  • 10
  • 3
    Nice solution; however, the questioner was asking about the **mean** absolute deviation. You've provided the **median** absolute deviation. – Sam Perry Jun 19 '17 at 11:15
31

The current version of statsmodels has mad in statsmodels.robust:

>>> import numpy as np
>>> from statsmodels import robust
>>> a = np.matrix( [
...     [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ],
...     [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ]
...  ], dtype=float )
>>> robust.mad(a, axis=1)
array([ 2.22390333,  5.18910776])

Note that by default this computes the robust estimate of the standard deviation assuming a normal distribution by scaling the result a scaling factor; from help:

Signature: robust.mad(a, 
                      c=0.67448975019608171, 
                      axis=0, 
                      center=<function median at 0x10ba6e5f0>)

The version in R makes a similar normalization. If you don't want this, obviously just set c=1.

(An earlier comment mentioned this being in statsmodels.robust.scale. The implementation is in statsmodels/robust/scale.py (see github) but the robust package does not export scale, rather it exports the public functions in scale.py explicitly.)

sfjac
  • 7,119
  • 5
  • 45
  • 69
  • 6
    Just note that this is **median**, not **mean** absolute deviation (and the latter was in the question) – marcin Dec 07 '18 at 08:54
  • @sfjac Good one. Can you please explain how the scaling is done to a array based on the constant `c=0.67448975019608171`? It would really help understand it better. I wasn't able to understand it clearly from the github page. It would really help. – Sai Kumar Dec 21 '18 at 10:39
  • @SaiKumar This constant is chosen so that if the data sample is drawn from a Gaussian population N(mu, sigma), the resulting statistic will be a robust estimate of sigma. – sfjac Dec 22 '18 at 16:49
  • @sfjac I didn't get that, what do you mean by robust estimate of sigma. Can you explain how the calculation is being done here. If you do `robust.mad(a, axis=1, c=1)` you get ouput as `[1.5,3.5]` and this is the correct `MAD` but why do we use `c=0.67 you get array as `[2.2239,5.1891]`. I want to know how it's being done. Does it multiple it with constant? I'm sorry I'm new to python. – Sai Kumar Dec 23 '18 at 10:35
  • Right. It computes the median of the absolute deviations from the sample median. This is a robust estimate of distribution width that is independent of the distribution. If the data is Gaussian, this value will be approximately equal to the standard deviation of the Gaussian divided by 0.67. By default the function rescales the result by this 0.67, and returns a statistic that is approximately equal to the standard deviation. – sfjac Dec 24 '18 at 16:56
17

It looks like scipy.stats.models was removed in august 2008 due to insufficient baking. Development has migrated to statsmodels.

bmu
  • 35,119
  • 13
  • 91
  • 108
matt
  • 4,089
  • 1
  • 20
  • 17
  • 7
    Yes, most of the old stats.models was the basis for scikits.statsmodels, after a lot of cleanup. MAD is at the bottom page here http://statsmodels.sourceforge.net/rlm.html as part of robust estimation of linear models but I never used it standalone since it's just a few lines. – Josef Jan 20 '12 at 04:48
  • 4
    The above link is broken, so I found [this one](http://statsmodels.sourceforge.net/devel/generated/statsmodels.robust.scale.mad.html?highlight=median%20absolute%20deviation) on the statsmodels documentation. – gabra Apr 11 '14 at 08:35
9

If you enjoy working in Pandas (like I do), it has a useful function for the mean absolute deviation:

import pandas as pd
df = pd.DataFrame()
df['a'] = [1, 1, 2, 2, 4, 6, 9]
df['a'].mad()

Output: 2.3673469387755106

Sam Perry
  • 2,554
  • 3
  • 28
  • 29
  • 1
    This is the best answer. Weird that this isn't built in to numpy... but since once can't have pandas without numpy one should really use pandas anyways! – HodorTheCoder Apr 26 '18 at 22:12
4

It's not the scipy version, but here's an implementation of the MAD using masked arrays to ignore bad values: http://code.google.com/p/agpy/source/browse/trunk/agpy/mad.py

Edit: A more recent version is available here.

Edit 2: There's also a version in astropy here.

keflavich
  • 18,278
  • 20
  • 86
  • 118
3

I'm just learning Python and Numpy, but here is the code I wrote to check my 7th grader's math homework which wanted the M(ean)AD of 2 sets of numbers:

Data in Numpy matrix rows:

import numpy as np

>>> a = np.matrix( [ [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ], \\    
... [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ] ], dtype=float )    
>>> matMad = np.mean( np.abs( np.tile( np.mean( a, axis=1 ), ( 1, a.shape[1] ) ) - a ), axis=1 )    
>>> matMad    
matrix([[ 1.81632653],
        [ 3.73469388]])

Data in Numpy 1D arrays:

>>> a1 = np.array( [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ], dtype=float )    
>>> a2 = np.array( [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ], dtype=float )    
>>> madA1 = np.mean( np.abs( np.tile( np.mean( a1 ), ( 1, len( a1 ) ) ) - a1 ) )    
>>> madA2 = np.mean( np.abs( np.tile( np.mean( a2 ), ( 1, len( a2 ) ) ) - a2 ) )    
>>> madA1, madA2    
(1.816326530612244, 3.7346938775510199)
RickC
  • 260
  • 3
  • 9
3

Using numpy only:

def meanDeviation(numpyArray):
    mean = np.mean(numpyArray)
    f = lambda x: abs(x - mean)
    vf = np.vectorize(f)
    return (np.add.reduce(vf(numpyArray))) / len(numpyArray)
tonix
  • 6,671
  • 13
  • 75
  • 136
2

I'm using:

from math import fabs

a = [1, 1, 2, 2, 4, 6, 9]

median = sorted(a)[len(a)//2]

for b in a:
    mad = fabs(b - median)
    print b,mad
duhaime
  • 25,611
  • 17
  • 169
  • 224
0

Do not want to be misleaded, the mad is now in scipy.stats:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.median_abs_deviation.html

Compute the median absolute deviation of the data along the given axis.

Gabriel
  • 40,504
  • 73
  • 230
  • 404
Frank
  • 505
  • 5
  • 14