1

I want to create a uint16 image of gaussian noise with a defined mean and standard deviation.

I've tried using numpy's random.normal for this but it returns a float64 array:

mu = 10
sigma = 100
shape = (1024,1024)
gauss_img = np.random.normal(mu, sigma, shape)

print(gauss_img.dtype)

>>> dtype('float64')

Is there a way to convert gauss_img to a uint16 array while preserving the original mean and standard deviation? Or is there another approach entirely to creating a uint16 noise image?

EDIT: As was mentioned in the comments, np.random.normal will inevitably sample negative values given a sd > mean, which is a problem for converting to uint16.

So I think I need a different method that will create an unsigned gaussian image directly.

holastello
  • 593
  • 1
  • 7
  • 16
  • Possible duplicate of [Convert ndarray from float64 to integer](https://stackoverflow.com/questions/8855574/convert-ndarray-from-float64-to-integer) – mkrieger1 Feb 08 '19 at 19:33
  • `gauss_img = gauss_img.astype(np.int16)` should do the trick. – Victor Valente Feb 08 '19 at 19:37
  • @mkrieger1 @Victor Valente simply using `astype` doesn't preserve the original mean and standard deviation – holastello Feb 08 '19 at 19:37
  • @holastello I think that is due to the fact that when downcasting you lose precision. But not so much in this specific case. Less then .1 difference per my code. – Victor Valente Feb 08 '19 at 19:40
  • You probably have to rescale the data before converting it. But you cannot avoid that some truncation happens when converting from 64-bit floating-point to 16-bit integer numbers. – mkrieger1 Feb 08 '19 at 19:41
  • it will be impossible for uint16 to maintain the original mean and std deviation, because mean 10 and std dev 100 will put nearly half your data < 0, and uint16 can't hold negative numbers. – kmh Feb 08 '19 at 19:41
  • @VictorValente hmm ok true. but I need it to be a uint16. How to convert from int16 to uint16? – holastello Feb 08 '19 at 19:41
  • Do you have evidence that the mean and standard deviation change more than you can tolerate? – mkrieger1 Feb 08 '19 at 19:44
  • @mkrieger1 `gauss_img.astype(np.int16)` gives a result that has a very similar mean/sd, but I need the image to be uint16. `gauss_img.astype(np.uint16)` gives a result with a very different mean/sd. – holastello Feb 08 '19 at 19:46
  • What `mu` and `sigma` are you using? As was already said, `uint16` (by definition) can't represent negative values, so the distribution gets skewed. Choose different parameters, or scale appropriately, so that most values are within the possible range (0 to 65535). – mkrieger1 Feb 08 '19 at 19:47
  • @mkrieger1 I'm trying to match the mean/sd of another uint16 image. One example image has mean=17 and sd=32. It seems I need a completely different strategy for making an unsigned gaussian array in the first place. – holastello Feb 08 '19 at 19:53
  • 1
    the problem here is that you're using mean and std dev to try and describe a skewed distribution, but then you're sampling from a normal distribution. two ideas... sometimes if you do one or more log transforms (np.log1p()) on a skewed dist, it will become more "normal" like. You can log transform your original data till it looks normal, then use the mean/std dev from that to generate new data, then np.expm1() it back to the original range. Alternatively you can investigate scipi.stats.skewnorm. – kmh Feb 08 '19 at 20:10
  • @kmh I think you've nailed my problem! It seems like scipi.stats.skewnorm could be a good solution, but I would first need to extract these parameters from the uint16 image that I'm trying to match. Any idea how to do this? – holastello Feb 08 '19 at 20:20

3 Answers3

1

So I think this is close to what you're looking for.

Import libraries and spoof some skewed data. Here, since the input is of unknown origin, I created skewed data using np.expm1(np.random.normal()). You could use skewnorm().rvs() as well, but that's kind of cheating since that's also the lib you'll use to characterize it.

I flatten the raw samples to make plotting histograms easier.

import numpy as np
from scipy.stats import skewnorm

# generate dummy raw starting data
# smaller shape just for simplicity
shape = (100, 100)
raw_skewed = np.maximum(0.0, np.expm1(np.random.normal(2, 0.75, shape))).astype('uint16')

# flatten to look at histograms and compare distributions
raw_skewed = raw_skewed.reshape((-1))

Now find the params that characterize your skewed data, and use those to create a new distribution to sample from that hopefully matches your original data well.

These two lines of code are really what you're after I think.

# find params
a, loc, scale = skewnorm.fit(raw_skewed)

# mimick orig distribution with skewnorm
new_samples = skewnorm(a, loc, scale).rvs(10000).astype('uint16')

Now plot the distributions of each to compare.

plt.hist(raw_skewed, bins=np.linspace(0, 60, 30), hatch='\\', label='raw skewed')
plt.hist(new_samples, bins=np.linspace(0, 60, 30), alpha=0.65, color='green', label='mimic skewed dist')
plt.legend()

enter image description here

The histograms are pretty close. If that looks good enough, reshape your new data to the desired shape.

# final result
new_samples.reshape(shape)

Now... here's where I think it probably falls short. Take a look at the heatmap of each. The original distribution had a longer tail to the right (more outliers that skewnorm() didn't characterize).

This plots a heatmap of each.

# plot heatmaps of each
fig = plt.figure(2, figsize=(18,9))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)

im1 = ax1.imshow(raw_skewed.reshape(shape), vmin=0, vmax=120)
ax1.set_title("raw data - mean: {:3.2f}, std dev: {:3.2f}".format(np.mean(raw_skewed), np.std(raw_skewed)), fontsize=20)

im2 = ax2.imshow(new_samples.reshape(shape), vmin=0, vmax=120)
ax2.set_title("mimicked data - mean: {:3.2f}, std dev: {:3.2f}".format(np.mean(new_samples), np.std(new_samples)), fontsize=20)

plt.tight_layout()

# add colorbar
fig.subplots_adjust(right=0.85)
cbar_ax = fig.add_axes([0.88, 0.1, 0.08, 0.8])  # [left, bottom, width, height]
fig.colorbar(im1, cax=cbar_ax)

Looking at it... you can see occasional flecks of yellow indicating very high values in the original distribution that didn't make it into the output. This also shows up in the higher std dev of the input data (see titles in each heatmap, but again, as in comments to original question... mean & std don't really characterize the distributions since they're not normal... but they're in as a relative comparison).

But... that's just the problem it has with the very specific skewed sample i created to get started. There's hopefully enough here to mess around with and tune until it suits your needs and your specific dataset. Good luck!

enter image description here

kmh
  • 1,516
  • 17
  • 33
  • 1
    This is fantastic. Thank you so much. I asked this question in more specific terms here: https://stackoverflow.com/questions/54600220/create-a-random-noise-image-with-the-same-mean-and-skewed-distribution-as-anot?noredirect=1#comment95999190_54600220. If you put or link this answer there I'll accept it there too (where I think it fits better with the question) – holastello Feb 09 '19 at 00:02
0

With that mean and sigma you are bound to sample some negative values. So i guess the option could be that you find the most negative value, after sampling, and add its absolute value to all the samples. After that convert to uint as suggested in the comments. But ofcourse you loose the mean this way.

Amuoeba
  • 624
  • 9
  • 28
0

If you have a range of uint16 numbers to sample from, then you should check out this post.

This way you could use scipy.stats.truncnorm to generate a gaussian of unsigned integers.

Victor Valente
  • 761
  • 9
  • 24