0

Let's say you have a 2D plot such as this one (as a list of points):

Scatter Plot

Now let's say you split up the plot into equal square areas like so:

Scatter Plot divided into sections

Question 1: If you wanted to find the density of each block (the amount of points per area), what computationally efficient techniques in C could be used (efficiency in terms of RAM usage and processing time)?

(Instead of comparing each point one by one to see if it lies in the interval of each square)

Question 2: If a new point was added, what efficient methods could be used to find which block the point resides in, and calculate the new density of the block.

(**Note: I'm not asking for the MOST efficient method as that would be opinion based)

  • Assuming you made clear what is meant by "most efficient method," why would this be opinion-based? – ad absurdum May 12 '17 at 21:10
  • Are you talking about analyzing *images* or analyzing the underlying point list? – John Bollinger May 12 '17 at 21:16
  • Analyzing it as a list of points – Robert James Mieta May 12 '17 at 21:17
  • You're going to need to look at each point to see where it lies, so it's impossible to do better than looking at each point to see where it lies. The best idea is probably just to create a 2D array of regions and count the number of points per region. This also allows easily updating the densities when adding (and removing) points. – Bernhard Barker May 12 '17 at 21:18
  • 2
    Then I suppose I would set up a 2D array whose elements correspond to the grid squares, iterate over the point list, and assign each point to a square by dividing its `x` and `y` coordinates by the grid spacing. – John Bollinger May 12 '17 at 21:19
  • Are your points' `x` and `y` coordinates bounded? Are points more "dense" than "sparse", eg there are much more squares with at least one point than empty squares? Do you need density in the exact squares, or in some regions made of squares? –  May 13 '17 at 05:08
  • Let's say a data set is given where the bounds are the max values of the set. Assume 100-1000 points, and less squares than points. If you can find densities in multiple region instead of squares that's ok (as long as it's more efficient) – Robert James Mieta May 13 '17 at 05:15
  • Building on the other comments, from a processing standpoint, you may gain a bit of efficiency if you can represent the array of regions with each row a sequence of `0`'s and `1`'s for (empty, filled) and then process each row as a series of `unsigned` values evaluating the '*population*' of `1` bits (or "Hamming Weight") 4-bytes at a time, See: [**How to count the number of set bits in a 32-bit integer?**](http://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer) – David C. Rankin May 13 '17 at 05:29

1 Answers1

0

In Java, assuming coordinates are non-negative and the number of bins is at most 2^31 on each axis:

Map<Pair<Integer, Integer>, List<Point>> binMap = new HashMap<>();
for (Point point : points) {
  Pair binXy = new Pair((int) (point.x / BIN_SIZE), (int) (point.y / BIN_SIZE));
  if (binMap.containsKey(binXy)) {
    binMap.get(binXy).add(point);
  } else {
    List<Point> pointList = new ArrayList<>();
    pointList.add(point);
    binMap.put(binXy, pointList);
  }
}

Now you can enumerate the points in each bin:

for (Entry<Pair<Integer, Integer>, List<Point>> entry : binMap.entrySet()) {
  System.out.println("Bin: " + entry.getKey() + " Points: " + entry.getValue());
}
Gene
  • 46,253
  • 4
  • 58
  • 96