1

How to create np array random data on age vs time?

My aim is to create a scatter plot representing random data on age vs. time spent watching TV.

from pylab import randn

X = randn(500)
Y = randn(500)
plt.scatter(X,Y)
plt.show()

I want age between 18 and 50 and time between 0 to 24 hours

3 Answers3

3

You can try :

import random
import numpy as np
age=np.array(random.sample(list(range(18,51)),10))
time=np.array(random.sample(list(range(0,24)),10))

random.sample takes a list of elements as first argument and the number of samples you want as the second argument.

That gives :

age  : [47 45 37 19 23 34 39 24 32 42]
time : [18 12 13  1 15 21 23 22  3 17]

On plotting it :

import matplotlib.pyplot as plt
plt.scatter(age, time)
plt.show()

enter image description here

To recreate the same random numbers every time you run it, you can use random.seed()

Sruthi
  • 2,908
  • 1
  • 11
  • 25
2

It's super easy with numpy. You can use numpy library to do this:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

age = np.random.randint(18, 50, 20)
time = np.random.randint(0, 24, 20)

plt.scatter(age, time)
plt.show()

enter image description here

YOLO
  • 20,181
  • 5
  • 20
  • 40
1

Column-wise multiplication in numpy

You can easily create custom-sized random arrays with numpy with the commands numpy.random.rand(d0, d1, …, dn) for uniform distributions or numpy.random.randn(d0, d1, …, dn) for normal distributions, where dn is the number of samples in the nth dimension. In your case you'll have d0=500 and d1=2.

However the values will be sampled from the interval [0, 1) in numpy.random.rand(d0, d1, …, dn). Or the standard normal distribution for numpy.random.randn(d0, d1, …, dn) (i.e. mean = 0 and variance = 1).

A nice turnaround for this is to sum and multiply the arrays column-wise to shilft the distributions to the desired values. To multiply in a column-wise fashion an array arr with a vector vec you can use this small snippet of code arr.dot(np.diag(vec)). Be careful, vec should have as much elements as arr has columns.

This snippet works by turning vec into a diagonal matrix (i.e. a matrix where everything is zero except the main diagonal) and the multiplying arr to the diagonal matrix.

For uniform distributions

Remeber that to turn a sample x from an uniform distribution [0, 1) to [min, max), you do new_x = (max - min) * x + min. So if you want an uniform distribution and you know the max and min limits for boths variables, you can do as use the following code:

import numpy as np

n_samples = 500
max_age, min_age = 80, 10
max_hours, min_hours = 10, 0

array = np.random.rand(n_samples, 2)  #returns samples from the uniform distribution
range_vector = np.array([max_age - min_age, max_hours - min_hours])
min_vector = np.array([min_age, min_hours])

sample = array.dot(np.diag(range_vector)) + np.ones(array.shape).dot(np.diag(min_vector))

Normal distributions

If you want a normal distribution and you know the mean and variances of both columns use the following code. Remeber that to shift a sample x from an standard normal distribution to a distribution with a different mean and standard deviation, you go new_x = deviation * x + mean.

import numpy as np

n_samples = 500
mean_age, deviation_age = 40, 20
mean_hours, deviation_hours = 5, 2

array = np.random.rand(n_samples, 2)  #returns samples from the standard normal distribution
deviation_vector = np.array([deviation_age, deviation_hours])
mean_vector = np.array([mean_age, mean_hours])

sample = array.dot(np.diag(deviation_vector)) + np.ones(array.shape).dot(np.diag(mean_vector))

Be careful however, with the normal distributions you can end up withg negative values.

You can also have a look at all the documentation numpy has on random variables: https://docs.scipy.org/doc/numpy/reference/routines.random.html

Finally please notice that column-wise multiplication only works when you want both samples to be independant.

Community
  • 1
  • 1
federicober
  • 113
  • 1
  • 10