1

Here's my use case--it is related to small neighborhood classification, but it could be anything where, during a loop, the ground conditions change (the state / grouping of individual areas change), and further iterations must relate to the new state of the system.

The problem is at this point more conceptual than code, so bear with me for no reproducible example.

I am looping over a list of small areas. I run spatial queries in postgis to get a list of all neighboring areas.

Each area can be thought of as a geographic "seed" for a cluster: If each seed's neighbors meet a certain size criteria, they are added to the list (the cluster "grows") and assigned a cluster ID.

Edit: Added detail

My starting point is a census tract layer in postgis, which I query for adjacent tracts. Say, tract 1 has as its neighbors tracts 2, 3, 5, and 6. Those tracts have a bunch of attributes, such as employment and population. So the idea is to treat, here, tract 1 as a potential geographic cluster seed, and, for each of the neighbors, add them if they meet certain data conditions, could be size of the population. In the example, let's say we add tracts 2 and 3, but not 5 and 6 (their population may be too small).

Seed 1 has now grown, with two additional tracts. I will then union those, have a larger area and query the original tract layer for neighbors of the now larger seed. Repeat until no neigbors fit the criteria, then move to the next tract in the original list.

Here's where I am having trouble with the concept:

  • I run through a seed area and enumerate all its neighbors, and at the end of each iteration, I want to reflect that there is now potentially a bigger seed, and the seed will have a new potential list of neighbors it can "absorb".

So in a sense, I want to run a loop on the original list, but it will be modified and re-queried every time I exhaust the list of neighbors for seed X. The next iteration, then will face a new state of the geographic system.

I could do this in a linear fashion, repeating code blocks, but that is hardly ideal. Or maybe iterations are a bad idea--"walk" through a list and then do

So to boil it into one question, how can I set up an iteration where the state changes, which is different from typical list comprehensions of the sort where options are fully enumerated from the start based on some initial state criteria, and have subsequent tests reflect the new state of the (geographic) system?

Community
  • 1
  • 1
ako
  • 3,569
  • 4
  • 27
  • 38
  • So a bit like how the k-means clustering algorithm proceeds? Perhaps, you could explain the algorithm a bit more, but you could do this in a loop in plpgsql or in Python, calling Postgis repeatedly, but it is very hard to give a more coherent answer, as the question is currently phrased. – John Powell Feb 22 '15 at 09:23
  • @JohnBarça, I added some detail and made it more specific. I had tried to make the problem a generic one, but that probably didn't help for clarity. – ako Feb 22 '15 at 18:07

1 Answers1

1

The problem appears to be in your definition of "iteration". A similar example is the computation of cellular automata where world updates are discrete and complete. A standard approach for this is having a "next iteration" world W_next which is created from the "now" state of the world W. This is completely different from your one-world approach that your example elaborates.

In pseudo-code the proper approach would be:

W = starting_world()
W_next = some_function_on(W)
W = W_next

This has the advantage of not flagging tracts as clustered before the entire world W is analyzed. To be more explicit, using your example, on iteration one, the cells (1, 2, 3) would be clustered and maybe (5, 6) has been placed with (9) or not.

If your rules for clustering are consistent, on iteration two, you'd already have the cluster (1, 2, 3) and then you could decide if 5, 6, or none were added to that cluster.

msw
  • 42,753
  • 9
  • 87
  • 112