Merging two partial (jointly overdetermined) sets of ordering information

Question

I have a web app with data in a grid. The user can reorder the columns, and the server can change which columns exist. I would like to save the user's column order in a cookie and restore it on page load.

More formally, I have two arrays of unique IDs (strings) called user_columns and server_columns. I would like to reorder server_columns such that I respect all of the ordering information from user_columns, and as much from server_columns as possible. How do I do this? What's a reasonable formal definition of "as much as possible"?

My analysis so far:

One aspect of the problem is trivial: if the server removes some columns, delete the corresponding entries from user_columns. Any information about the ordering of columns which are no longer there is moot. The problem then becomes one of merging two potentially conflicting sets of ordering information.

This corresponds to a family of problems from voting theory: given a set of ballots, each of which contains a partial order between the candidates, produce a complete ordering of the candidates which in some sense reflects the ballots.

This leads me to think that I might get a workable solution by applying e.g. the Schulze Method or Ranked Pairs to a sufficiently rigged set of ballots based on user_columns and server_columns. For UX reasons, breaking ties by inserting new columns last (to the right) seems like a good idea to me.

Does this sound like it's on the right track?

Note also that we can consider three kinds of comparisons: A and B are both in user_columns, one of them is, or none of them are. The former and latter kinds are easily resolved (refer to user_columns and server_columns, respectively); the one in the middle, and its interactions with the latter, are the tricky parts.

Possibly related: http://stackoverflow.com/questions/15932601/sequence-merging-between-two-semi-synchronized-lists and http://stackoverflow.com/questions/6352212/how-to-merge-a-collection-of-ordered-preferences — Jonas Kölker, Mar 21 '14 at 22:30
Please clarify the question. Can the server reorder columns as well or just change the set of columns? Because in the former case, why do you mention voting theory at all? There's no voting necessary if only one side has an ordering at all — Niklas B., Mar 22 '14 at 03:05
Both `user_columns` and `server_columns` are arrays, and the server can change the order of columns, i.e. if A and B both occur in `server_columns` in two consecutive page loads, they can occur in two different orders. The more interesting property, I think, is that `user_columns` and `server_columns` can disagree in a single page load, but in either case I think the answer is "yes" :) — Jonas Kölker, Mar 22 '14 at 03:14
Would you be happy with an algorithm that minimizes the inversions with respect to `server_columns` and respects `user_columns` completely (zero inversions w.r.t. `user_columns`)? Because that how I interpret the question. — Niklas B., Mar 22 '14 at 04:25

Niklas B. · Answer 1 · 2014-03-22T05:48:15.110

4

Let's say we have C columns, numbered from 1 to C. We have two sequences sequences of columns, U = u₁, u₂, ... u_n and S = s₁, s₂, ... s_m. We want to find a permutation P of S, such that P has no inversions with regard to U and a minimal number of inversions with regard to S.

We can show that there is such an optimal P which is an interleaving of U ∩ S and S \ U. By "interleaving" I mean that P has no inversions with regard to U ∩ S or S \ U.

We can apply dynamic programming to find the optimal interleaving: Let A = (a_i) = U ∩ S and B = (b_j) = S \ U. Let f(i,j) be the number of inversions w.r.t. S of the optimal interleaving of the prefixes a_1...i of A and b_1...j of B. The idea is very similar to the longest common subsequence DP algorithm. We have the recurrence

f(0,j) = 0 for all j >= 0
f(i,0) = f(i-1, 0) + sum(k=1 to i-1, [1 if A[i] appears before A[k] in S])
f(i,j) = min(f(i-1, j) + sum(k=1 to i-1, [1 if A[i] appears before A[k] in S])
                       + sum(k=1 to j, [1 if A[i] appears before B[k] in S]),
             f(i, j-1) + sum(k=1 to i, [1 if B[j] appears before A[k] in S])
                       + sum(k=1 to j-1, [1 if B[j] appears before B[k] in S]))

I used the notation [1 if X] here to denote the value 1, if X is true and 0, if X is false.

The matrix f can be built in time O(|A|^2 * |B|^2). The minimal cost (number of inversions w.r.t. S) will be f(|A|, |B|).

We can reconstruct the optimal permutation using the DP matrix as well: We build it from back to front. We start with the tuple (i,j) = (|A|, |B|) and at every step depending on which of the two options is minimum in the DP transition, we know whether we need to put A[i] or B[j] to the front of the permutation. Then we proceed with (i-1, j) or (i, j-1) depending on which we choose.

Here is an implementation of the algorithm, please excuse my lack of JS skills.

edited Mar 22 '14 at 05:48

answered Mar 22 '14 at 05:14

Niklas B.

92,950
18
194
224

Regarding the interleaving proof: assume we have *b₂, aᵢ, ..., aⱼ, b₁* as part of the solution with *b₁* inverted relative to *b₂*. Let *Inv(1)* be the number of inversions between *b₁* and *aᵢ..aⱼ*, and *Inv(2)* likewise. If *Inv(1) ≤ Inv(2)* then *aᵢ, ..., aⱼ, b₁, b₂* has fewer inversions than the original(*new Inv(2) = old Inv(1) ≤ old Inv(2)*, and we fixed the *b₁/b₂* inversion), and similarly in reverse if *Inv(2) ≤ Inv(1)*. Since swapping adjacent out-of-order *b*-elements lowers the inversion count, the solution has the elements of *b* in order. QED. Correct? – Jonas Kölker Mar 23 '14 at 06:12
@JonasKölker Very good, that is actually very similar to the proof idea I had in mind and it is a very compact formulation of it. I like it – Niklas B. Mar 23 '14 at 06:16
@JonasKölker Actually I think swapping b1 and b2 can cost you one additional inversion, but since you also decrement it by putting b1 and b2 in order you get at least an equally good permutation. Or to be more precise, `new Inv(2) = old Inv(1) <= old Inv(2)` doesn't necessarily hold if I'm not mistaken, but it's a bit too late here for me to be sure about that right now – Niklas B. Mar 23 '14 at 06:23
@JonasKölker By the way in case you're interested, the recurrence can probably be solved in O(|A| * |B|) – Niklas B. Mar 23 '14 at 06:28
I'd be very interested in both a fix of my proof and a faster algorithm :) – Jonas Kölker Mar 23 '14 at 06:30
@Jonas The algorithmic part is easier: E.g. Let `g(x,K) = sum(k=1 to K, [1 if x appears before A[k] in S])`. Then we have `g(x,K) = g(x,K-1) + [1 if x appears before A[K] in S]`, so we can compute all the sums in O(1). You can answer "X appears before Y in S" queries in O(1) by precomputing the ranks of elements in S – Niklas B. Mar 23 '14 at 06:38

score 1 · Answer 2 · answered Mar 21 '14 at 22:59

1

One definition that might be reasonable is to minimize the number of inversions relative to the server's order, subject to the constraint that the restriction to the columns in common of the two orders be equal. I don't know offhand an algorithm to minimize this objective.

answered Mar 21 '14 at 22:59

David Eisenstat

64,237
7
60
120

This sounds like a reasonable definition. I suspect that the Schulze Method or Ranked Pairs can achieve this, but I haven't though enough about it. – Jonas Kölker Mar 21 '14 at 23:03
I think we can just use DP to optimally interleave the client column set with the server column set excluding the client columns – Niklas B. Mar 22 '14 at 05:14
@NiklasB. I thought that there might be something like that. I had to rush out the door five minutes after I started this answer. – David Eisenstat Mar 22 '14 at 13:23

score 1 · Answer 3 · edited May 23 '17 at 12:30

This relates to Niklas B's answer:

Theorem: Consider a sequence S = s₁, …, sₙ of some orderet set (e.g. integers). If i < j and sᵢ > sⱼ, then swapping sᵢ and sⱼ decreases the number of inversions—that is, let S' = s₁, …, sᵢ₋₁, sⱼ, sᵢ₊₁, …, sⱼ₋₁, sᵢ, sⱼ₊₁, …, sₙ; then S' has fewer inversions than S.

Intuitively stated: if two elements are out of order and you swap them, you're closer to having a sorted list.

Proof: Observe that the only elements that have a different relative order in S and S' are (sᵢ, sⱼ), (sᵢ, sₖ) and (sⱼ, sₖ) for each k where i < k < j. We know that (sᵢ, sⱼ) is an inversion in S but not S', so consider sₖ for some such k.

Either sₖ < sⱼ < sᵢ or sⱼ < sₖ < sᵢ or sⱼ < sᵢ < sₖ (we assume the elements of S to be unique).

In the first case, (sᵢ, sₖ) is an inversion in S and (sⱼ, sₖ) is an inversion in S'. In the second case, (sᵢ, sₖ) and (sⱼ, sₖ) are inversions in S but not in S'. In the third case, (sⱼ, sₖ) is an inversion in S and (sᵢ, sₖ). These are all the changes in inversions.

In each case, the number of inversions in S' is either the same as that in S or it is smaller. Recall that (sᵢ, sⱼ) got fixed from S to S', and we get the desired result. ■

Thus, if we have a₁, bᵢ, …, bⱼ, a₂ with each a ∈ S \ U and each b ∈ U ∩ S and a₁ > a₂ and we swap a₁ and a₂, getting a₂, bᵢ, …, bⱼ, a₁, the inversion count is lower. Since such swaps only rearrange elements of S \ U and not those of U ∩ S, any solution which has zero inversions on U ∩ S and (subject to that) a minimal number of inversions on S \ U must make all such swaps.

Ergo: the elements of S \ U must occur in order, and thus the solution is an interleaving of U ∩ S and S \ U.

Kudos for doing the proof on your own, I really appreciate that :) It looks good to me, I had something wrong yesterday when I thought there was a problem. My approach was something like this: Let's say we have `b2 ai, ..., aj b1` as a substring of the permutation and we have `X < b1 < Y < b2 < Z` in S, where `X`, `Y` and `Z` represent subsets of `ai, ..., aj`. Then `b2 ai, ..., aj b1` has `1 + |X| + 2|Y| + |Z|` inversions, `b1 b2 ai, ..., aj` has `2|X| + |Z|` and `ai, ..., aj b1 b2` has `2|Z| + |Y|`. So depending on whether `|X| < Z|` or not we can always do strictly better using one of those — Niklas B., Mar 23 '14 at 21:07

Merging two partial (jointly overdetermined) sets of ordering information

3 Answers3

Linked