Column-wise multiplication in numpy
You can easily create custom-sized random arrays with numpy with the commands numpy.random.rand(d0, d1, …, dn)
for uniform distributions or numpy.random.randn(d0, d1, …, dn)
for normal distributions, where dn
is the number of samples in the nth dimension. In your case you'll have d0=500
and d1=2
.
However the values will be sampled from the interval [0, 1) in numpy.random.rand(d0, d1, …, dn)
. Or the standard normal distribution for numpy.random.randn(d0, d1, …, dn)
(i.e. mean = 0 and variance = 1).
A nice turnaround for this is to sum and multiply the arrays column-wise to shilft the distributions to the desired values. To multiply in a column-wise fashion an array arr
with a vector vec
you can use this small snippet of code arr.dot(np.diag(vec))
. Be careful, vec
should have as much elements as arr
has columns.
This snippet works by turning vec
into a diagonal matrix (i.e. a matrix where everything is zero except the main diagonal) and the multiplying arr
to the diagonal matrix.
For uniform distributions
Remeber that to turn a sample x
from an uniform distribution [0, 1)
to [min, max)
, you do new_x = (max - min) * x + min
. So if you want an uniform distribution and you know the max and min limits for boths variables, you can do as use the following code:
import numpy as np
n_samples = 500
max_age, min_age = 80, 10
max_hours, min_hours = 10, 0
array = np.random.rand(n_samples, 2) #returns samples from the uniform distribution
range_vector = np.array([max_age - min_age, max_hours - min_hours])
min_vector = np.array([min_age, min_hours])
sample = array.dot(np.diag(range_vector)) + np.ones(array.shape).dot(np.diag(min_vector))
Normal distributions
If you want a normal distribution and you know the mean and variances of both columns use the following code. Remeber that to shift a sample x
from an standard normal distribution to a distribution with a different mean and standard deviation, you go new_x = deviation * x + mean
.
import numpy as np
n_samples = 500
mean_age, deviation_age = 40, 20
mean_hours, deviation_hours = 5, 2
array = np.random.rand(n_samples, 2) #returns samples from the standard normal distribution
deviation_vector = np.array([deviation_age, deviation_hours])
mean_vector = np.array([mean_age, mean_hours])
sample = array.dot(np.diag(deviation_vector)) + np.ones(array.shape).dot(np.diag(mean_vector))
Be careful however, with the normal distributions you can end up withg negative values.
You can also have a look at all the documentation numpy has on random variables: https://docs.scipy.org/doc/numpy/reference/routines.random.html
Finally please notice that column-wise multiplication only works when you want both samples to be independant.