Where can I find mad (mean absolute deviation) in scipy?

Question

It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers:

http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473

However, I can not find it anywhere in current versions of scipy. Of course it is possible to just copy the old code from repository but I prefer to use scipy's version. Where can I find it, or has it been replaced or removed?

Sorry, a search in the [github repository](https://github.com/scipy/scipy) gave me nothing. — Rik Poggi, Jan 19 '12 at 18:16
@RomanSusi, no, but as I stated in the question, that is not the point. — Ton van den Heuvel, Jan 20 '12 at 08:21
Beware, "MAD" usually refers to the "Median absolute deviation", **not** the mean difference: https://en.wikipedia.org/wiki/Mean_absolute_difference — Lucas Cimon, Jun 28 '14 at 18:16

mhsmith · Answer 1 · 2017-04-17T21:08:19.800

60

[EDIT] Since this keeps on getting downvoted: I know that median absolute deviation is a more commonly-used statistic, but the questioner asked for mean absolute deviation, and here's how to do it:

from numpy import mean, absolute

def mad(data, axis=None):
    return mean(absolute(data - mean(data, axis)), axis)

edited Apr 17 '17 at 21:08

answered Oct 19 '13 at 23:41

mhsmith

6,675
3
41
58

score 36 · Answer 2 · answered May 08 '14 at 07:56

36

For what its worth, I use this for MAD:

def mad(arr):
    """ Median Absolute Deviation: a "Robust" version of standard deviation.
        Indices variabililty of the sample.
        https://en.wikipedia.org/wiki/Median_absolute_deviation 
    """
    arr = np.ma.array(arr).compressed() # should be faster to not use masked arrays.
    med = np.median(arr)
    return np.median(np.abs(arr - med))

answered May 08 '14 at 07:56

Lee

640
7
10

3

Nice solution; however, the questioner was asking about the **mean** absolute deviation. You've provided the **median** absolute deviation. – Sam Perry Jun 19 '17 at 11:15

sfjac · Accepted Answer · 2016-11-15T21:15:55.477

31

The current version of statsmodels has mad in statsmodels.robust:

>>> import numpy as np
>>> from statsmodels import robust
>>> a = np.matrix( [
...     [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ],
...     [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ]
...  ], dtype=float )
>>> robust.mad(a, axis=1)
array([ 2.22390333,  5.18910776])

Note that by default this computes the robust estimate of the standard deviation assuming a normal distribution by scaling the result a scaling factor; from help:

Signature: robust.mad(a, 
                      c=0.67448975019608171, 
                      axis=0, 
                      center=<function median at 0x10ba6e5f0>)

The version in R makes a similar normalization. If you don't want this, obviously just set c=1.

(An earlier comment mentioned this being in statsmodels.robust.scale. The implementation is in statsmodels/robust/scale.py (see github) but the robust package does not export scale, rather it exports the public functions in scale.py explicitly.)

edited Nov 15 '16 at 21:15

answered Nov 15 '16 at 21:02

sfjac

7,119
5
45
69

6

Just note that this is **median**, not **mean** absolute deviation (and the latter was in the question) – marcin Dec 07 '18 at 08:54
@sfjac Good one. Can you please explain how the scaling is done to a array based on the constant `c=0.67448975019608171`? It would really help understand it better. I wasn't able to understand it clearly from the github page. It would really help. – Sai Kumar Dec 21 '18 at 10:39
@SaiKumar This constant is chosen so that if the data sample is drawn from a Gaussian population N(mu, sigma), the resulting statistic will be a robust estimate of sigma. – sfjac Dec 22 '18 at 16:49
@sfjac I didn't get that, what do you mean by robust estimate of sigma. Can you explain how the calculation is being done here. If you do `robust.mad(a, axis=1, c=1)` you get ouput as `[1.5,3.5]` and this is the correct `MAD` but why do we use `c=0.67 you get array as `[2.2239,5.1891]`. I want to know how it's being done. Does it multiple it with constant? I'm sorry I'm new to python. – Sai Kumar Dec 23 '18 at 10:35
Right. It computes the median of the absolute deviations from the sample median. This is a robust estimate of distribution width that is independent of the distribution. If the data is Gaussian, this value will be approximately equal to the standard deviation of the Gaussian divided by 0.67. By default the function rescales the result by this 0.67, and returns a statistic that is approximately equal to the standard deviation. – sfjac Dec 24 '18 at 16:56

score 17 · Answer 4 · edited Jun 27 '12 at 13:33

17

It looks like scipy.stats.models was removed in august 2008 due to insufficient baking. Development has migrated to statsmodels.

edited Jun 27 '12 at 13:33

bmu

35,119
13
91
108

answered Jan 19 '12 at 21:47

matt

4,089
1
20
17

7

Yes, most of the old stats.models was the basis for scikits.statsmodels, after a lot of cleanup. MAD is at the bottom page here http://statsmodels.sourceforge.net/rlm.html as part of robust estimation of linear models but I never used it standalone since it's just a few lines. – Josef Jan 20 '12 at 04:48
4

The above link is broken, so I found [this one](http://statsmodels.sourceforge.net/devel/generated/statsmodels.robust.scale.mad.html?highlight=median%20absolute%20deviation) on the statsmodels documentation. – gabra Apr 11 '14 at 08:35

score 9 · Answer 5 · answered Jun 19 '17 at 11:13

9

If you enjoy working in Pandas (like I do), it has a useful function for the mean absolute deviation:

import pandas as pd
df = pd.DataFrame()
df['a'] = [1, 1, 2, 2, 4, 6, 9]
df['a'].mad()

Output: 2.3673469387755106

answered Jun 19 '17 at 11:13

Sam Perry

2,554
3
28
29

1

This is the best answer. Weird that this isn't built in to numpy... but since once can't have pandas without numpy one should really use pandas anyways! – HodorTheCoder Apr 26 '18 at 22:12

keflavich · Answer 6 · 2017-12-21T20:40:14.727

4

It's not the scipy version, but here's an implementation of the MAD using masked arrays to ignore bad values: http://code.google.com/p/agpy/source/browse/trunk/agpy/mad.py

Edit: A more recent version is available here.

Edit 2: There's also a version in astropy here.

edited Dec 21 '17 at 20:40

answered Feb 25 '13 at 17:33

keflavich

18,278
20
86
118

score 3 · Answer 7 · answered May 05 '16 at 05:17

I'm just learning Python and Numpy, but here is the code I wrote to check my 7th grader's math homework which wanted the M(ean)AD of 2 sets of numbers:

Data in Numpy matrix rows:

import numpy as np

>>> a = np.matrix( [ [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ], \\    
... [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ] ], dtype=float )    
>>> matMad = np.mean( np.abs( np.tile( np.mean( a, axis=1 ), ( 1, a.shape[1] ) ) - a ), axis=1 )    
>>> matMad    
matrix([[ 1.81632653],
        [ 3.73469388]])

Data in Numpy 1D arrays:

>>> a1 = np.array( [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ], dtype=float )    
>>> a2 = np.array( [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ], dtype=float )    
>>> madA1 = np.mean( np.abs( np.tile( np.mean( a1 ), ( 1, len( a1 ) ) ) - a1 ) )    
>>> madA2 = np.mean( np.abs( np.tile( np.mean( a2 ), ( 1, len( a2 ) ) ) - a2 ) )    
>>> madA1, madA2    
(1.816326530612244, 3.7346938775510199)

score 3 · Answer 8 · answered Jan 10 '18 at 19:19

3

Using numpy only:

def meanDeviation(numpyArray):
    mean = np.mean(numpyArray)
    f = lambda x: abs(x - mean)
    vf = np.vectorize(f)
    return (np.add.reduce(vf(numpyArray))) / len(numpyArray)

answered Jan 10 '18 at 19:19

tonix

6,671
13
75
136

score 2 · Answer 9 · answered Apr 21 '14 at 16:21

2

I'm using:

from math import fabs

a = [1, 1, 2, 2, 4, 6, 9]

median = sorted(a)[len(a)//2]

for b in a:
    mad = fabs(b - median)
    print b,mad

answered Apr 21 '14 at 16:21

duhaime

25,611
17
169
224

This does not work when the number of sample is an even number. – heroxbd Nov 25 '16 at 12:52

score 0 · Answer 10 · edited Apr 06 '23 at 15:06

0

Do not want to be misleaded, the mad is now in scipy.stats:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.median_abs_deviation.html

Compute the median absolute deviation of the data along the given axis.

edited Apr 06 '23 at 15:06

Gabriel

40,504
73
230
404

answered Dec 09 '20 at 20:59

Frank

505
5
14

1

Are you saying it's now part of the library? – Scratte Dec 09 '20 at 21:26
@Frank it isn't entirely clear what you are trying to say. Please consider editing your answer and adding more details... – Sabito stands with Ukraine Dec 10 '20 at 21:07
@Scratte Yes it is in scipy.stats – Frank Dec 10 '20 at 22:34
This is the **median** absolute deviation, not the **mean** – Gabriel Apr 06 '23 at 15:05

Where can I find mad (mean absolute deviation) in scipy?

10 Answers10

Linked