2

Let L be a list of N items, from K < N different categories. For example, the string aababaaabaabb, where N=13, K=2. Let's suppose only swapping of consecutive elements is allowed, and only once per item, so items in the transformed list are never more than one position away of their original position. Swapping is costly, so there may be a limited swapping budget of M operations. These are the constraints.

I would like to obtain the list with the maximal structure, and the sequence of operations that lead to it. The "structure" can be objectively measured by a positive scalar function of the current state of the list. I don't think the details are important, what matters is that for a given arrangement of elements in the list there is a single value that characterises the arrangement. To make it concrete it could be for example the class entropy of the partition of the list in blocks of consecutive elements of the same type (as described here or here).

The entropy of a partition P = {p1,p2,...,pn} with 0 < pi < 1 and sum(pi) = 1 is (pi is the fraction, frequency or probability of each component):

H(P) = -sum(pi * log(pi))

I think this could be cast as an optimisation problem, a MIP perhaps, and solved that way, though I haven't been able to formulate it properly. This approach does not really attract me because the sequences I have are longer than the toy example, with more classes, and I have to do this a large number of times. There must be a more elegant solution. A greedy algorithm does not work either because this is a combinatorial problem and the global optimum (even within constraints) may not be reachable by a sequence of locally optimal moves.

I have a couple of questions:

  • Does this (type of) problems have a name or class that would help me find algorithmic solutions ? Can someone provide any helpful pointer ?

  • Is a brute force approach feasible ? The constraints severely restricts the search space but not to the point of trivialness... Any tip here ? (edited)

To end with the concrete example above, the non-redundant possibilities that make longer and more uniform blocks within the constraints are aaabbaaabaabb and aabbaaaabaabb (swapping 3 with 4 or 4 with 5, there are other equally good as the first). The initial partition would be {aa,b,a,b,aaa,b,aa,bb}, the fraction of each block {2/13, 1/13, 1/13, 1/13, 3/13, 1/13, 2/13, 2/13}, the fraction of each block within its class {2/8, 1/5, 1/8, 1/5, 3/8, 1/5, 2/8, 2/5} (the size or number of elements of class a is 8, and b is 5), and the class entropy H_0 the weighted sum of entropy for class a and b:

H_a = - 2/8*log(2/8) - 1/8*log(1/8) - 3/8*log(3/8) - 2/8*log(2/8)
    = 1.3208883

H_b = - 1/5*log(1/5) - 1/5*log(1/5) - 1/5*log(1/5) - 2/5*log(2/5)
    = 1.332179

H_0 = 8/13 * H_a + 5/13 * H_b
    = 1.3252309

while the entropy for the two possibilities above would be (lets call them H_1 and H_2)

H_a1 = - 3/8*log(3/8) - 3/8*log(3/8) - 2/8*log(2/8)
H_b1 = - 2/5*log(2/5) - 1/5*log(1/5) - 2/5*log(2/5)
H_1 = 8/13 * H_a1 + 5/13 * H_b1
    = 1.071705

H_a2 = - 2/8*log(2/8) - 4/8*log(4/8) - 2/8*log(2/8)
H_b2 = - 2/5*log(2/5) - 1/5*log(1/5) - 2/5*log(2/5)
H_2 = 8/13 * H_a2 + 5/13 * H_b2
    = 1.0455667

From these solutions the largest reduction in entropy is achieved by case 2, at the cost of 1 swap. There are two other solutions as good as case 1, but at a cost of 2 swaps.

Thanks a lot.

Edit 1 On the question of brute force and enumerating moves, it is actually simple. Swaps can only occur at a class boundary in the list. If there are J boundaries, then the total number of choices is upper bounded by:

Sum_i C_J,i <= 2^J - 1

where C_n,k is combination of n taken k at a time, and the sum goes from i = 1 to min(J,M). In the example, J = 7.

Community
  • 1
  • 1
rifius
  • 21
  • 3
  • Depending on how "structure" works, it sounds like this could be solved in linear time and constant space by dynamic programming. Your description of how the structure is measured is unclear, though. – user2357112 May 22 '17 at 03:15

0 Answers0