5

I am trying to develop an algorithm to solve a problem that I am not able to classify, I expose the subject:

You have a map divided into sections that have a certain area and where a certain number of people live.

The problem consists of finding sets of connected sections whose area does not exceed a certain value maximizing the number of selected inhabitants.

For now I can think of two approaches:

  • Treat the problem as an all-pairs shortest paths problem in an undirected graph with positive natural values where the solutions that do not meet the constraint of the maximum selected area will be discarded. For this you could use the Floyd-Warshall algorithm, Dijkstra for all pairs or Thorup algorithm (which can be done in time V * E, where these are the vertices and edges of the graph).
  • Treat it as an open vehicle routing problem with profits where each vehicle can start and end wherever it wants (open vehicle routing problem with profits or OVRPP).
  • Another aproach

Also, depending on the combinatorics of the particular problem it is possible in certain cases to use genetic algorithms, together with tabu search, but this is only for cases where finding an optimal solution is inadmissible.

To be clearer, what is sought is to obtain a selection of connected sections whose sum of areas does not exceed a total area. The parameter to maximize is the sum of populations of the selected sections. The objective is to find an optimal solution.

For example, this is the optimal selection with max area of 6 (red color area)

enter image description here

Thank you all in advance!

Olivier
  • 13,283
  • 1
  • 8
  • 24
AdCerros
  • 153
  • 1
  • 1
  • 11
  • 2
    This seems to be a clustering problem – ravenspoint Feb 11 '23 at 22:30
  • 1
    I agree with ravenspoint. Identify high density clusters, and connect the clusters with low area paths. In the example, section A has density 100/6=16.7, B has 50/4=12.5, C has 20/2=10, D has 70/3=23.3, and E has 75/3=25. D and E have the highest density, forming a cluster with area 6. Since that uses all of the allowed area, the problem is finished. In a larger version of the problem, you'll need to identify multiple clusters, and then attempt to connect the clusters with low area paths. – user3386109 Feb 11 '23 at 22:51
  • How many sections do you have? – Olivier Feb 24 '23 at 08:15
  • The range can go from 10 to 3600 with an average neighborhood of 5, in many occasions given the level of combinatorics it is quite possible that this can only be solved by tabu search or similar. In this case, I would like to know if this problem has a proper name and if it is feasible in a reasonable time. – AdCerros Feb 24 '23 at 13:28
  • How long do you consider reasonable? – ravenspoint Feb 25 '23 at 20:30
  • Possibly helpful: https://link.springer.com/chapter/10.1007/978-3-540-30559-0_31 – Dave Mar 02 '23 at 13:35

4 Answers4

2

One pragmatic approach would be to formulate this as an instance of integer linear programming, and use an off-the-shelf ILP solver. One way to formulate this as an ILP instance is build a graph with one vertex per section and an edge between each pair of adjacent sections; then, you want to select a connected component in that graph.

So, let x_{v,d} be a set of zero-or-one variables, one for each vertex v and for each d=0,1,..,n-1, where n is an upper bound on the number of vertices that might be selected (e.g., the total number of vertices in the graph). Add constraints to enforce that there is exactly one vertex, call it r, for which x_{r,0}=1. The intended meaning is that at each step we grow the size of the selected connected component, by adding vertices that are adjacent to ones that were already selected. After step 0, we have selected r. If x_{v,d}=1, then v is part of the connected component selected after d steps (and thus there is guaranteed to be a path of length <=d from r to v). It follows that the connected component we ultimately end up with, after all the steps complete, contains the vertices v where x_{v,n-1}=1.

You can enforce this meaning by adding the linear inequalities x_{v,d} >= x_{v,d-1}, and x_{v,d} <= x_{v,d-1} + sum_u x_{u,d-1} where the sum is over all u that are adjacent to v, and sum_v x_{v,0} = 1.

Finally, you have a constraint that the total area does not exceed the maximum: sum_v A_v x_{v,n-1} <= maxarea, where A_v is the area of section v.

Then your goal is to maximize sum_v P_v x_{v,n-1}, where P_v is the population of section v. The solution to this integer linear programming problem will give the optimal solution to your problem.

D.W.
  • 3,382
  • 7
  • 44
  • 110
  • 1
    Hello again D.W, I have developed this implementation in glpk and after many tests I have come to the conclusion that this solution is not correct. Since non-connected selections can be formed. This is because although the constraint sum_v x_v = 1 + sum_{u,v} y_{u,v} must be respected, there can be cases where for example three vertices form a triangle (connected) and on the other hand two vertices joined by an edge are selected (also connected). The constraint is satisfied, since there are 5 vertices and 4 edges, however the selected graph is not connected. – AdCerros Feb 21 '23 at 16:38
  • 1
    The theorem I think you are referring to |E| >= |V|-1 serves to prove that a graph is disjoint, but it does not prove that it is connected. – AdCerros Feb 21 '23 at 16:58
  • 1
    @AdCerros, Oops! You are quite right! That is a fatal flaw in my answer. Let me come back to this to propose a fix. – D.W. Feb 22 '23 at 00:19
  • @AdCerros, I apologize for the delay, and for my earlier erroneous answer. Please see my revised answer. I have attempted to fix the flaw and I think my solution might work now. Please check it carefully -- I hope it doesn't have any other flaws. – D.W. Mar 01 '23 at 08:29
  • Hello again, thank you very much for your reply. I would like to know if in the following case the result would be correct. I have some doubts about it, I think that only paths are being allowed. Imagine a connected tree with two branches, the model should be able to select the vertex v with v+1 but also the vertex v with v+2 since in this case the connection is not made between v+1 and v+2. Is this contemplated? – AdCerros Mar 01 '23 at 17:58
  • @AdCerros, what does v+1 and v+2 mean in this context? – D.W. Mar 02 '23 at 03:26
2

Originally was going to leave this as a comment, as it doesn't constitute a complete answer with coded solution, but I believe it may approximate the canonical one.

Your problem has been studied extensively (perhaps not this particular variation, but many others including this one.) It is a variant of the knapsack problem and is NP-complete or harder (NP-hard). This should be obvious since a special case of your problem is when all nodes are connected to all other nodes, which is precisely the 0-1 knapsack problem.

In reference (1), the authors show an approximation algorithm that approximates with lower and upper bounds of

(1−ε)/2 ·(1 − 1/e^(1−ε))

and

1 − 1/e + ε

respectively, which is probably as close as you will get in terms of a good approximation algorithm.

There is a simple variation of the dynamic programming solution to the 0-1 knapsack problem that would yield a good approximation, which I can discuss some more if there is interest and time.

(1) Borradaile, Glencora, Brent Heeringa, and Gordon Wilfong. "The knapsack problem with neighbour constraints." Journal of Discrete Algorithms 16 (2012): 224-235.

ldog
  • 11,707
  • 10
  • 54
  • 70
  • *"It is a variant of the knapsack problem and is NP-complete or harder (NP-hard). This should be obvious since a special case of your problem is when all nodes are connected to all other nodes, which is precisely the 0-1 knapsack problem."* <<< I upvoted but I disagree with this attempted proof of NP-hardness. The OP's problem appears to be restricted to planar graphs. The case of a complete graph, where all vertices are connected, is outside of this problem. Thus your reduction from the knapsack problem doesn't apply here. – Stef Mar 03 '23 at 23:24
  • @Stef Good point, if planarity of the graph is guaranteed, then this may not be NP-hard or NP-complete. The OP should clarify if graphs are guaranteed to be planar or if there are non-planar cases he cares about here. – ldog Mar 03 '23 at 23:33
  • 1
    @Stef If we restrict ourselves to only planar graphs, this is likely solvable in polynomial time using a non-submodular formulation on a binary MRF & computing MAP inference using this algorithm: http://proceedings.mlr.press/v9/schraudolph10a/schraudolph10a.pdf – ldog Mar 03 '23 at 23:38
  • Thank you very much for your answer and the bibliography provided, it is just what I was looking for. – AdCerros Mar 04 '23 at 19:18
1
 - Construct empty list of combined sections
 - LOOP s1 over sections
     - set found false
     - LOOP s2 over sections
         - IF s1 = s2 
             - CONTINUE 
         - IF s1,s2 not connected
             - CONTINUE
         - IF s1,s2 combined population > maximum
             - CONTINUE
         - ADD s1,s2 to list
         - set found true
         - break out of loop s2
     - IF found
         - CONTINUE loop s1

 - SET found = true
 - WHILE ( found )
      found = false
      - LOOP C over combination in list
          - LOOP U over uncombined sections
             - IF not connected
                 - CONTINUE
             - IF combined population > maximum
                 - CONTINUE
     - REPLACE C in list with C + U
     - found = true

- Calculate value of list ( sum of populations in sections in list )

Now you have an initial feasible solution and can generate more

- WHILE( reasonable time limit not reached )
     - random shuffle the order of sections used by the loops
     - run the the above algorithm to get a different feasible solution
     - IF new solution has a greater value than previous, replace

Here is the code to find a reasonable feasible solution

void combine()
{
    raven::set::cRunWatch aWatcher("combine sections");

    // Finds pairs of sections that can be combined
    for (auto &a1 : vSection)
    {
        // check for already combined
        if (a1.myfCombined)
            continue;
        
        // search for possible section to combine with a1
        for (auto &a2 : vSection)
        {
            // is it uncombined
            if (a2.myfCombined)
                continue;

            // is it itself
            if (a1.myName == a2.myName)
                continue;

            // is the the combined area under the limit
            if (a1.myArea + a2.myArea > maxArea)
                continue;

            // is it physically connected
            if (!a1.isConnected(a2))
                continue;

            // OK to combine
            cCombined comb;
            comb.add(a1);
            comb.add(a2);
            theCombined.push_back(comb);
            break;
        }
    }

    // Try adding uncombined sections to the combinations already found

    bool fimproved = true;
    while (fimproved)
    {
        // loop over the combinations until no more new combinations found
        fimproved = false;
        for (auto &C : theCombined)
        {
            // loop over uncombined seaction for possible addition to combimation
            for (auto &U : vSection)
            {
                // it it uncombined
                if (U.myfCombined)
                    continue;

                // is the the combined area under the limit
                if (C.myArea + U.myArea > maxArea)
                    continue;

                // is it physically connected
                if (!C.IsConnected(U))
                    continue;

                // OK, add to combination
                C.add(U);
                fimproved = true;
            }
        }
        // check we are still finding additional combinations
        if (!fimproved)
            break;
    }
}

The complete application code is at https://github.com/JamesBremner/so75423308

Here is the result of a unit test on a 10 section randomly generated problem

sections
S0 area 2 pop 68 connected 9 4 8 2
S1 area 1 pop 6 connected 7 5 2 6 4
S2 area 1 pop 54 connected 1 6
S3 area 1 pop 96 connected 6 1 8 9 2 7 5
S4 area 1 pop 4 connected 2
S5 area 2 pop 74 connected 1 3 8 7
S6 area 1 pop 63 connected 7 9 3 1 8 5 0
S7 area 1 pop 89 connected 0 2 4 8 6 5
S8 area 1 pop 30 connected
S9 area 1 pop 7 connected 3

combinations
S0 S2 S8 S9  | area 5 pop 159
S1 S4  | area 2 pop 10
S3 S5  | area 3 pop 170
S6 S7  | area 2 pop 152

Here is the result of a timing test on 2,000 node randomly generated problem

raven::set::cRunWatch code timing profile
Calls           Mean (secs)     Total           Scope
       1        0.0299339       0.0299339       combine sections

Conclusion: The code generates a feasible solution for your worst case problem in 30 milliseconds, so, for your performance requirement of 2 minutes, 4,000 possible solutions can be checked. This may not be enough to guarantee the optimal will be found but you should get quite close. For smaller problems the optimal is guaranteed.

Here is a timing profile where 4,000 combinations are tried and the best result is stored

Try 18 best value now 100542
Try 101 best value now 100552
Try 183 best value now 100577
Try 308 best value now 100582
raven::set::cRunWatch code timing profile
Calls           Mean (secs)     Total           Scope
    4000        0.0328638       131.455 try
    4001        0.0327868       131.18  combine sections

Note that because of the strict limits ( area and physical connection ) on which sections are candidates to be combined, there are a lot less than 4,000 feasible combinations. Many different initial arrangements result in the same combination generated.

If you have an actual, real world problem please post it and I will run my code on it for you.

ravenspoint
  • 19,093
  • 6
  • 57
  • 103
  • If you cannot provide a clear performance requirement, then your question has no answer. – ravenspoint Feb 27 '23 at 15:24
  • Found and fixed a bug. Updated results of timing profile. Conclusion unchanged. FYI here is the bug fix commit https://github.com/JamesBremner/so75423308/commit/03bfd507ebc79f6d266cb45b0a3e789ff7bcd0cf – ravenspoint Mar 01 '23 at 15:58
1

With thousands of sections, the problem has a high combinatorial complexity so you will probably need a heuristic. I propose this one. The idea is to start with a single section and expand it iteratively by selecting the neighbour with the highest ratio population / area. More precisely:

  1. Take an acceptable section s (i.e. with area(s) <= maxArea) and put it into a set : S = {s}.

  2. Consider all neighbour sections of S. Sort them by the ratio population / area in decreasing order. Find the first acceptable t element from that list (i.e. such that area(S) + area(t) <= maxArea).

  3. If an element t was found, add it to S and go back to step 2. If no element was found, you now have a candidate solution S.

  4. Take the next acceptable section and go back to step 1.

When it's done, you have a number of candidate solutions. Select the one with the highest population.

It's possible to improve this algorithm with a stochastic variant: instead of always taking the first acceptable element in the list, draw a random number to decide which one you select (e.g. 90% probability for the first one and 10% for the second). For every initial section, repeat the loop (steps 2 and 3) a number of times and see if it beats the determinisitic solution.

Olivier
  • 13,283
  • 1
  • 8
  • 24
  • Sorry, but this has already been tested and usually finds suboptimal solutions. This is because it only tests whether or not to include the section in an order, so many feasible solutions are eliminated. For example, imagine we include section A and examining its neighbors B and C we include C as the best and leave B out. Then we expand C and do not add its neighbors because they exceed the maximum area. A possible solution could be to now include B or at least examine if it fits in the selection, which this algorithm will not do. – AdCerros Feb 27 '23 at 12:23
  • I think that the concept of heuristics to which you refer is not entirely correct, in this problem can not find an admissible heuristic, all that can be extracted are pseudoheuristics with strategies such as tabu search or simulted annelaing. – AdCerros Feb 27 '23 at 12:48
  • @AdCerros *"which this algorithm will not do"* The stochastic version has a chance to choose B instead of C. – Olivier Feb 27 '23 at 19:37
  • The stochastic version is trivial, we are looking for an optimal and fast algorithm, not a coin flip. – AdCerros Feb 28 '23 at 11:57