I have a numpy 2-D array X
with shape (n_samples, n_features)
. I want to apply masking noise to each sample i.e. each row. Basically, for each row entry, I want to randomly select a fraction frac
of the total n_features
elements and set them to 0
.
I have vectorized the inner part of the loop till now, but cannot get rid of the outer i
loop.
My current code is given below.
def add_noise(X, frac):
X_noise = X.copy()
n_samples = X.shape[0]
n_features = X.shape[1]
for i in range(n_samples):
mask = np.random.randint(0, n_features, int(frac * n_features))
X_noise[i][mask] = 0
return X_noise
An example is shown below.
test_arr = np.arange(1, 11)
test_arr = np.array([test_arr, test_arr])
print(test_arr)
print(add_noise(test_arr, 0.3))
[[ 1 2 3 4 5 6 7 8 9 10]
[ 1 2 3 4 5 6 7 8 9 10]]
[[ 1 0 3 4 5 6 0 8 9 0] # 0.3 * num_features = 3 random elements
[ 0 2 3 4 5 6 7 0 0 10]] # for each row set to 0
How do I get rid of the outer loop?