1

I'm trying to determine whether a set of points are uniformly distributed in a 1 x 1 x 1 cube. Each point comes with an x, y, and z coordinate that corresponds to their location in the cube.
A trivial way that I can think of is to flatten the set of points into 2 graphs and check how normally distributed both are however I do not know whether that's a correct way of doing so.
Anyone else has any idea?

CXY
  • 33
  • 8
  • Projecting the points onto some lower-dimensional space can't give the right answer, as you suspect; one can invent cases in which the project points are distributed uniformly but the original points are not. The right way to look at this is to look at the so-called discrepancy (discussed under [Low-discrepancy sequence](https://en.wikipedia.org/wiki/Low-discrepancy_sequence)); essentially a set with less discrepancy is more uniform. However, I don't know a computationally straightforward way to compute discrepancy, although there are a lot of things I don't know, so you might look for it. – Robert Dodier Feb 20 '21 at 15:56
  • A straightforward approach which is going to be pretty common is to cut the cube up into boxes and count the number of points in each box and look at (counted minus expected)^2 / expected or something like that; this is identical to the chi square statistic. Anyway obviously a set is more uniform when that quantity is smaller. I would steer away from a chi square test, since your null hypothesis is almost certainly false, but the test statistic might be useful to you. – Robert Dodier Feb 20 '21 at 16:00
  • Further discussion should probably be directed to stats.stackexchange.com or math.stackexchange.com. – Robert Dodier Feb 20 '21 at 16:01
  • @RobertDodier Thank you for the small cube insight! That'll probably be good enough as a trivial approach first. The issue is the computation must be performant and it doesn't have to be very accurate. – CXY Feb 21 '21 at 01:26

1 Answers1

1

I would compute point density map and then check for anomalies in it:

  1. definitions

    let assume we have N points to test. If the points are uniformly distributed then they should form "uniform grid" of mmm points:

    m * m * m = N
    m = N^(1/3)
    

    To account for disturbances from uniform grid and asses statistics you need to divide your cube to grid of cubes where each cube will hold several points (so statistical properties could be computed) let assume k>=5 points per grid cube so:

    cubes = m/k
    
  2. create a 3D array of counters

    simply we need integer counter per each grid cube so:

    int map[cubes][cubes][cubes];
    

    fill it with zeroes.

  3. process all points p(x,y,z) and update map[][][]

    Simply loop through all of your points, and compute grid cube position they belong to and update their counter by incrementing it.

    map[x*(cubes-1)][y*(cubes-1)][z*(cubes-1)]++;
    
  4. compute average count of the map[][][]

    simple average like this will do:

    avg=0;
    for (xx=0;xx<cubes;xx++)
     for (yy=0;yy<cubes;yy++)
      for (zz=0;zz<cubes;zz++)
       avg+=map[xx][yy][zz];
    avg/=cubes*cubes*cubes;
    
  5. now just compute abs distance to this average

    d=0;
    for (xx=0;xx<cubes;xx++)
     for (yy=0;yy<cubes;yy++)
      for (zz=0;zz<cubes;zz++)
       d+=fabs(map[xx][yy][zz]-avg);
    d/=cubes*cubes*cubes;
    

    the d will hold a metric telling how far are the points from uniform density. Where 0 means uniform distribution. So just threshold it ... the d is also depending on the number of points and my intuition tells me d>=k means totally not uniform so if you want to make it more robust you can do something like this (the threshold might need tweaking):

    d/=k;
    if (d<0.25) uniform;
     else nonuniform;
    

As you can see all this is O(N) time so it should be fast enough for you. If it isn't you can evaluate every 10th point by skipping points however that can be done only if the order of points is random. If not you would need to pick N/10 random points instead. The 10 might be any constant but you need to take in mind you still need enough points to process so the statistic results are representing your set so I would not go below 250 points (but that depends on what exactly you need)

Here few of my answers using density map technique:

Spektre
  • 49,595
  • 11
  • 110
  • 380
  • Amazing work. Thank you! I'll be sure to credit you when the time comes – CXY Feb 21 '21 at 12:12
  • Just a question, wouldn't the computation of avg be simplified by dividing numOfPoints by cubes^3? – CXY Feb 21 '21 at 12:49
  • @CXY `cubes*cubes*cubes` is the same as `cubes^3` however if you meant by using `pow` then no that would be slower. Anyway optimizing single simple operation that appears just once in comparison to per point computations has next to no effect so its not worth the time ... – Spektre Feb 21 '21 at 14:32