3

I have two variables, lets call them x and y, which when plotted are the scattered blue points in the graph. I have fitted them using curve_fit from Scipy.

I want to generate (lets say 500000) "smoothed" random numbers replicating the distribution followed by x and y.

enter image description here

By "smoothed" I mean, I don't want randoms that exactly replicate my data (x and y) like in the figure below, with the red diamonds being my data distribution and the histogram being my generated randoms. (even the fluctuations of the data are replicated here!!!!). I want a "smoothed" histogram.

enter image description here

What I have tried so far is to fit the points x and y using curve_fit from scipy. So now I know what the data distribution is. Now I need to create random numbers that follow the above fit/distribution.

P.S I have also tried creating uniform randoms from 0 to 1 and trying to get the points below the fitted curve, but I don't know how!

Srivatsan
  • 9,225
  • 13
  • 58
  • 83

2 Answers2

3

I propose that you take your data distribution fit and then add some random "noise" to it, this should produce some data that still follows your distribution but is randomised for whatever purpose you require.

Below is some code which takes a data distribution fit (in the function curve) and then randomised the data that is retrieved from it using the numpy.random module.

import numpy as np
import matplotlib.pyplot as plt
from random import random

# I don't have your data but let's assume that this function 
# replicates the data distribution you want to work with.
def curve(x):
    return 2. * x + 5.

N = 100
x = np.linspace(0,1,100)
y_fit = curve(x)

# margin controls how "noisy" you want your fit to be.
margin = 0.5

noise = margin*(np.random.random(N)-0.5)
y_ran = y_fit + noise

plt.plot(x, y_fit) # Plot the fitted distribution.
plt.plot(x, y_ran, 'rx') # Plot the noisy data.

plt.show()

Note that this only creates 100 randomised results, you could modify the code to make as many as you need if you wished.

Plot

Ffisegydd
  • 51,807
  • 15
  • 147
  • 125
  • so the randoms generated here is the blue line fit, right? – Srivatsan Jun 05 '14 at 13:04
  • The problem is that you say x = np.linspace(0,1,100) y_fit = curve(x) But my fit function takes as input the x from the data distribution – Srivatsan Jun 05 '14 at 13:09
  • I have constructed x and y manually as I don't have your data, this code is simply proof-of-concept to show a possible way for you to proceed. You'll have to do the actual work yourself. – Ffisegydd Jun 05 '14 at 13:11
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/55137/discussion-between-srivatsan-and-ffisegydd). – Srivatsan Jun 05 '14 at 13:38
  • The problem is, it generated randoms within a small area, i.e the fitted curve. I have added the picture of the generated randoms in the chatroom. How should I generate 500000 points within the fitted curve? – Srivatsan Jun 10 '14 at 10:01
1

What I think you might be able to do is to rescale your fit to the y-range [0,1], and then start the following loop:

  • generate a random x value
  • for this x value, generate a y value in the range [0,1]
  • if this y value is below the value of the rescaled fit at that x value, accept it, otherwise discard the x-y pair and go to the next iteration of the loop

this should give you a bunch of random numbers that follow your smoothed distribution

MultiVAC
  • 354
  • 1
  • 10
  • This is what I was thinking and have also mentioned in the question at the end. But the problem is that the fit has only 25 values, so for generating 500000 randoms, the loop does not work!! – Srivatsan Jun 05 '14 at 12:57
  • @Srivatsan : that is odd... what exactly goes wrong with the loop? I cannot immediately see what could go wrong, even if you have only 25 values – MultiVAC Jun 05 '14 at 13:08
  • if data values are 25 and my randoms have 500000, when I try to run the loop it says The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() – Srivatsan Jun 10 '14 at 09:43