The road network dataset I have consisted of edges and nodes. I mostly work with edges, which has(edge_id,start_node,end_node,edge_length). In order to simulate the real-world scenario, I need to generate random road objects i.e. Point-Of-Interests (POI) on the map. The POI will have attributes like (Object_id, edge_id, distance_from_start_node,edge_type[boolean]). Now in order to generate 10k road objects following Gaussian distribution, I randomly choose one edge from the whole edges dataset and need to traverse 10k times to generate those road objects and every time pick the Gaussian distance value. When I set the mean to zero '0' I have no idea what Standard deviation size I should use there? I can't just use an arbitrary standard deviation value due to the edge_length of dataset. Can anybody help me with how to generate a random double value that follows the gaussian distribution?
1 Answers
Random.nextGaussian, as you know, generates a random number from a normal distribution with a mean of zero and a standard deviation of one.
That means you'll have plenty of negative numbers coming out of that generator. Negative numbers probably make no sense in your application. So you should plan to avoid them.
Why not try something like this to get your random numbers (NOT DEBUGGED)
public static float nextDistance () {
final double meanlength = 10.0;
final double meanstdev = 5.0;
while (true) {
double result = meanLength + (meanstdev * rnd.nextGaussian())
if (result >= 0) return result;
}
This will returns a positive distance value with a mean of 10 and a stdev of ±5. If the Gaussian lookup as offset and scaled turns out to be negative, this function just tries again. That discarding of samples corrupts your randomness slightly. But for test data it's probably OK.
I wonder, though, whether your application should use a Poisson distribution rather than a Gaussian distribution? That might describe your distribution of edge lengths better than the Gaussian distribution does. You can generate Poisson random numbers as well.

- 103,626
- 17
- 118
- 172
-
I have found another solution to work with. I will generate random gaussian values using the XY location coordinates of each nodes and it will be on euclidean space. Then I will sort of overlay the edge dateset on top of the euclidean space and take the data points that stays on the egdes and remove other. I will iterate over the function until I get specific amount of distances. – Aavash Bhandari Nov 25 '20 at 16:13