10

I'm sure the opposite has been asked many times but I couldn't find any answers on how to generate bad random numbers.

I want to write a small program for cluster analysis and want to generate some random Points for testing. If I would just insert 1000 Points with random coordinates they would be scattered all over the field which would make a cluster analysis worthless.

Is there a simple way to generate Random Numbers which build clusters?

I already thought about either not using random() but random()*random() which generates normally distributed numbers (I think I read this somewhere here on Stack Overflow).

Second approach would be picking a few areas at random and run the point generation again in this area which would of course produce a cluster in this area.

Do you have a better idea?

Amro
  • 123,847
  • 25
  • 243
  • 454
Nicolas
  • 1,828
  • 6
  • 23
  • 34
  • What you said: decide on either a distribution or clusters and generate random numbers using that as the probability density function. – Konrad Rudolph Nov 04 '10 at 16:17
  • 7
    Reminds me of http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/2000/300/2318/2318.strip.gif – Gumbo Nov 04 '10 at 16:19
  • I assume you're talking about this question: http://stackoverflow.com/questions/3956478/understanding-randomness when talking about `random()*random()` – Yi Jiang Nov 04 '10 at 16:43

5 Answers5

7

If you are deliberately producing well formed clusters (rather than completely random clusters), you could combine the two to find a cluster center, and then put lots of points around it in a normal distribution.

As well working in cartesian coords (x,y); you could use a radial method to distribute points for a particular cluster. Choose a random angle (0-2PI radians), then choose a radius. Note that as circumference is proportional radius, the area distribution will be denser close to the centre - but the distribution per specific radius will be the same. Modify the radial distribution to produce a more tightly packed cluster.

OR you could use real world derived data for semi-random point distributions with natural clustering. Recently I've been doing quite a bit of geospatial cluster analysis. For this I have used real world data - zipcode centroids (which form natural clusters around cities); and restaurant locations. Another suggestion: you could use a stellar catalogue or galactic catalogue.

winwaed
  • 7,645
  • 6
  • 36
  • 81
5

Generate few anchors. True random numbers. Then generate noise around them:

anchor + dist * (random() - 0.5))

this will generate clustered numbers, that will be evenly distributed in distance dist.

Andrey
  • 59,039
  • 12
  • 119
  • 163
2
  • Add an additional dimension to your model.
  • Draw an irregular (i.e. not flat) surface.
  • Generate numbers in the extended space.
  • Discard all numbers which are on one side of the surface.
  • From every number left, drop the additional dimension.
Debilski
  • 66,976
  • 12
  • 110
  • 133
1

Maybe I have misunderstood, but the gnu scientific library (written in c) has many distributions written within it - could you not pick coordinates from the Gaussian/poisson etc from that library?

http://www.gnu.org/software/gsl/manual/html_node/Random-Number-Distributions.html

They provide a simple example with the Poisson distribution from the link, too.

If you need your distribution to be bounded (for example y-coordinate not less than -1) then you can achieve that by rejection sampling from the uniform distribution in the gsl.

Blessings, Tom

Tom
  • 76
  • 2
  • The OP does not state the environment he's using, and the GNU license might not be suitable for his project. – winwaed Nov 04 '10 at 17:06
0

My first thought was that you could implement your own using a linear congruential generator and experiment with the coefficients until you get a low enough period to suit your needs. A really low m coefficient should do the trick.

I also like your second idea of running a good RNG around a few pre-selected points to create clusters. You could either target specific areas for the clusters with this method, or generate those randomly as well.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880