2

I would like to select one element from a list using python following the normal distribution. I have a list, e.g.,

alist = ['an', 'am', 'apple', 'cool', 'why']

For example, according to the probability density function (PDF) of normal distribution, the 3rd element in the given list should have the largest probability to be chosen.Any suggestions?

tdy
  • 36,675
  • 19
  • 86
  • 83
Frank Wang
  • 1,462
  • 3
  • 17
  • 39
  • 2
    Normal distribution is defined for a continuous unbounded variable. In your case you can draw samples from a normal distribution and round them to integers and drop values outside the bounds, for example. This will be a normal-ish distribution, which may be what you want. – fjarri Feb 18 '16 at 03:51
  • 1
    Do be aware that a normal distribution does not have lower or upper bounds on output; it is vanishingly unlikely, but you *could* get a +20 sigmas value back. – Hugh Bothwell Feb 18 '16 at 03:53
  • 1
    There isn't any such thing as *the* normal distribution. It is a two-parameter family of distributions. You seem to want the mean to be the midpoint of your list, but that still leaves the variance up in the air. For some choices of variance, the choices would be virtually indistinguishable from uniformly chosen. For other choices, you would be returning the middle element almost always. You really need to clarify just what you want. – John Coleman Feb 18 '16 at 03:58

2 Answers2

8
from random import normalvariate

def normal_choice(lst, mean=None, stddev=None):
    if mean is None:
        # if mean is not specified, use center of list
        mean = (len(lst) - 1) / 2

    if stddev is None:
        # if stddev is not specified, let list be -3 .. +3 standard deviations
        stddev = len(lst) / 6

    while True:
        index = int(normalvariate(mean, stddev) + 0.5)
        if 0 <= index < len(lst):
            return lst[index]

then

alist = ['an', 'am', 'apple', 'cool', 'why']
for _ in range(20):
    print(normal_choice(alist))

gives

why
an
cool
cool
cool
apple
cool
apple
am
am
apple
apple
apple
why
cool
cool
cool
am
am
apple
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
3

Are you sure you really want a normal distribution, you could look at a Beta Distribution, which would probably give you what you need, e.g.:

>>> import random
>>> from collections import Counter
>>> alist = ['an', 'am', 'apple', 'cool', 'why']
>>> Counter(alist[int(random.betavariate(2, 2)*len(alist))] for _ in range(100))
Counter({'am': 20, 'an': 9, 'apple': 34, 'cool': 23, 'why': 14})
>>> Counter(alist[int(random.betavariate(10, 10)*len(alist))] for _ in range(1000))  
Counter({'am': 183, 'apple': 621, 'cool': 189, 'why': 4, 'an': 3})
AChampion
  • 29,683
  • 4
  • 59
  • 75