How do I calculate the delta (inserted/deleted/moved indexes) of two lists?

Question

Assuming I have two lists of objects that have unique ids and an attribute that determines their order, how can I efficiently get the delta indexes (which indexes were inserted, which were deleted, and which were moved)?

Example of input:

let before: [(id: String, timestamp: String)] = [
    ("A", "2015-06-04T12:38:09Z"),
    ("B", "2015-06-04T10:12:45Z"),
    ("C", "2015-06-04T08:39:55Z"),
    ("D", "2015-06-03T23:58:32Z"),
    ("E", "2015-06-01T00:05:51Z"),
]

let after: [(id: String, timestamp: String)] = [
    ("F", "2015-06-04T16:13:01Z"),
    ("C", "2015-06-04T15:10:29Z"),
    ("A", "2015-06-04T12:38:09Z"),
    ("B", "2015-06-04T10:12:45Z"),
]

let delta = deltaFn(before, after)

Here's the above visualized:

BEFORE                                   AFTER
+-------+----+----------------------+    +-------+----+----------------------+
| index | id | timestamp            |    | index | id | timestamp            |
+-------+----+----------------------+    +-------+----+----------------------+
|     0 |  A | 2015-06-04T12:38:09Z |    |     0 |  F | 2015-06-04T16:13:01Z |
|     1 |  B | 2015-06-04T10:12:45Z |    |     1 |  C | 2015-06-04T15:10:29Z |
|     2 |  C | 2015-06-04T08:39:55Z |    |     2 |  A | 2015-06-04T12:38:09Z |
|     3 |  D | 2015-06-03T23:58:32Z |    |     3 |  B | 2015-06-04T10:12:45Z |
|     4 |  E | 2015-06-01T00:05:51Z |    |     - |    |                      |
+-------+----+----------------------+    +-------+----+----------------------+

Expected result (delta):

Inserted indexes:  [0]
Deleted indexes:   [3, 4]
Moved indexes:     [(from: 0, to: 2), (from: 1, to: 3), (from: 2, to: 1)]

Nope, it's the value that determines order, so it has to change for the order of items to change. In the example above, the timestamp of C changes so that it goes up above A and B. — Blixt, Jun 04 '15 at 20:43
@Blixt Would you mind directing us what you find unsatisfying in the current solution, what would you expect from a solution? current solutions include algorithmic approach how to do it + complexity analysis (mine for example), and code (@MartinR for example). What exactly are you after? — amit, Jun 11 '15 at 07:53

score 19 · Accepted Answer · edited Aug 08 '15 at 01:30

19

It can be solved by using 2 maps, that map from the ID of each element to its index, and comparing them.

Time complexity is O(n) for hash maps and O(nlogn) for tree based maps.

Pseudo code:

map1 = empty map
map2 = empty map
for each element x with index i in before:
    map1.insert(x,i)
for each element x with index i in after:
    map2.insert(x,i)

//find moved and deleted:
for each key x in map1:
   id1 = map1.get(x)
   id2 = map2.get(x)
   if id2 == nil:
       add id1 to "deleted indexes"
   else if id1 != id2:
       add (id1,id2) to "moved indexes"
       map2.delete(x)
//find new indexes:
for each key x in map2:
    add map2.get(x) to "inserted indexes"

Edit: (Suggested in comments)

You can minimize memory output to O(min{m,n}) and time in case of tree-based map to O(max{m,n}log(min{m,n})), where m,n are the sizes of the two lists, by mapping only the smallest list, and then iterating the array (which was not mapped) rather than the map.

map = empty map
for each element x with index i in smaller list:
    map.insert(x,i)

for each element x with index i1 in larger list:
   i2 = map.get(x)
   if i2:
       if i1 != i2:
           add (i2, i1) to "moved indexes" if smaller list is before
           add (i1, i2) to "moved indexes" if smaller list is after
       map.delete(x)
   else:
       add i1 to "inserted indexes" if smaller list is before
       add i1 to "deleted indexes" if smaller list is after

// Find new indexes:
for each key x in map:
    add map.get(x) to "deleted indexes" if smaller list is before
    add map.get(x) to "inserted indexes" if smaller list is after

edited Aug 08 '15 at 01:30

Blixt

49,547
13
120
153

answered Jun 04 '15 at 21:00

amit

175,853
27
231
333

1

I think it can be done only with one map though. It should suffice to transform only the "before" list to a map. What do you think about it? – Gentian Kasa Jun 16 '15 at 07:42
1

@GentianKasa Thanks for your comment. If you choose to load only "Before" (or after, one of them arbitrarly), then it doesn't make a real difference (asymptotically). However, if you smartly load only the shorter list, you shrink memory output to `O(min{m,n})`, and also time in case of tree-based maps. See edit. – amit Jun 16 '15 at 09:41
1

@amit I tried to follow your heuristic (I updated it because it said `if id2 == nil then add id2 …` which I think wasn't right), but I'm not getting the right results. See this example and let me know if you have any suggestions: https://gist.github.com/blixt/e461c0296e2b9162e2fd – Blixt Aug 08 '15 at 01:26
@Blixt Yes, you are correct - you need to remove the item from the list even for same index. Did it solve your problem? – amit Aug 08 '15 at 09:29
@amit I think so, yes. Thank you! – Blixt Aug 10 '15 at 14:35
@amit There is one more concern with this algorithm that I've noticed. The "moved" list isn't simplified based on what's inserted/deleted. This means that adding one item to the beginning of a list of 99 items will yield 1 index in `inserted` and 99 index pairs in `moved`. Ideally, the `moved` list would only contain indexes for items that actually moved (relatively) within the list. Is this possible with your algorithm? – Blixt Aug 10 '15 at 19:49

Martin R · Answer 2 · 2015-06-04T22:08:51.933

A possible solution (similar to @amit's answer, but using only a single map):

// A dictionary mapping each id to a pair
//    ( oldIndex, newIndex )
// where oldIndex = -1 for inserted elements
// and newIndex = -1 for deleted elements.
var map : [ String : (from: Int, to: Int)] = [:]

// Add [ id : (from, -1) ] for each id in before:
for (idx, elem) in enumerate(before) {
    map[elem.id] = (from: idx, to: -1)
}

// Update [ id : (from, to) ] or add [ id : (-1, to) ] for each id in after:
for (idx, elem) in enumerate(after) {
    if (map[elem.id]?.to = idx) == nil {
        map[elem.id] = (from: -1, to: idx)
    }
}

var insertedIndices : [Int] = []
var deletedIndices : [Int] = []
var movedIndices : [(from: Int, to: Int)] = []

// Compare from: and to: index for each dictionary value:
for pair in map.values {
    switch pair {
    case (let fromIdx, -1):
        deletedIndices.append(fromIdx)
    case (-1, let toIdx):
        insertedIndices.append(toIdx)
    default:
        movedIndices.append(pair)
    }
}

println(insertedIndices) // [0]
println(deletedIndices)  // [3, 4]
println(movedIndices)    // [(1, 3), (0, 2), (2, 1)]

Alternatively, use optionals to indicate the absence of old or new index, as suggested by @doisk:

// A dictionary mapping each id to a pair
//    ( oldIndex, newIndex )
// where oldIndex = nil for inserted elements
// and newIndex = nil for deleted elements.
var map : [ String : (from: Int?, to: Int?)] = [:]

// Add [ id : (from, nil) ] for each id in before:
for (idx, elem) in enumerate(before) {
    map[elem.id] = (from: idx, to: nil)
}

// Update [ id : (from, to) ] or add [ id : (nil, to) ] for each id in after:
for (idx, elem) in enumerate(after) {
    map[elem.id] = (map[elem.id]?.from, idx)
}

// Compare:
var insertedIndices : [Int] = []
var deletedIndices : [Int] = []
var movedIndices : [(from: Int, to: Int)] = []

for pair in map.values {
    switch pair {
    case (let .Some(fromIdx), let .Some(toIdx)):
        movedIndices.append(from: fromIdx, to: toIdx)
    case (let .Some(fromIdx), .None):
        deletedIndices.append(fromIdx)
    case (.None, let .Some(toIdx)):
        insertedIndices.append(toIdx)
    default:
        fatalError("Oops") // This should not happen!
    }
}

Would optionals for the (from, to) tuple work better? So you could do your second for loop like: `map[elem.id] = (map[elem.id]?.from, idx)` — oisdk, Jun 04 '15 at 22:02
@doisk: That is a good suggestion. I had actually also prepared a version using optionals but dropped it because the switch-statement because a bit "uglier". But I did not think of this simplified update statement. I will add it as an alternative. — Martin R, Jun 04 '15 at 22:06

score 1 · Answer 3 · answered Jun 04 '15 at 21:40

My solution does not use the map function. Computational complexity is O(n * m) where n: elms in before and m: elms in after.

And I am afraid this is not the best solutions available... however here it is :)

import Foundation

// Elm class that contains id and timestamp and is Equatable
class Elm {
    let id : String
    let timestamp : String
    init(tuple : (id:String, timestamp:String)) {
        self.id = tuple.id
        self.timestamp = tuple.timestamp
    }
}
func ==(lhs: Elm, rhs: Elm) -> Bool {
    return lhs.id == rhs.id
}
extension Elm : Equatable {}

// data
let before: [Elm] = [
    Elm(tuple: ("A", "2015-06-04T12:38:09Z")),
    Elm(tuple: ("B", "2015-06-04T10:12:45Z")),
    Elm(tuple: ("C", "2015-06-04T08:39:55Z")),
    Elm(tuple: ("D", "2015-06-03T23:58:32Z")),
    Elm(tuple: ("E", "2015-06-01T00:05:51Z"))
]

let after: [Elm] = [
    Elm(tuple: ("F", "2015-06-04T16:13:01Z")),
    Elm(tuple: ("C", "2015-06-04T15:10:29Z")),
    Elm(tuple: ("A", "2015-06-04T12:38:09Z")),
    Elm(tuple: ("B", "2015-06-04T10:12:45Z"))
]

// O(m * n)
func inserted(before:[Elm], after:[Elm]) -> [Int] {
    var inserted = [Int]()
    for (index, elm) in enumerate(after) {
        if !contains(before, elm) {
            inserted.append(index)
        }
    }
    return inserted
}

// O(n * m)
func deleted(before:[Elm], after:[Elm]) -> [Int] {
    var deleted = [Int]()
    for (index, elm) in enumerate(before) {
        if !contains(after, elm) {
            deleted.append(index)
        }
    }
    return deleted
}

// O(n * m)
func moved(before:[Elm], after:[Elm]) -> [Int:Int] {
    var moved = [Int:Int]()
    for (index, elm) in enumerate(before) {
        if contains(after, elm) && (after[index] != before[index]) {
            moved[index] = find(after, elm)
        }
    }
    return moved
}

inserted(before, after)
deleted(before, after)
moved(before, after)

score 0 · Answer 4 · edited May 23 '17 at 12:24

Here's what I managed:

var map: [String : (bef: Int?, aft: Int?)] = [:]

for (idx, (bef, aft)) in zipWithPadding(before, after).enumerate()
  where bef?.id != aft?.id {
  bef.map{map[$0.id] = (idx, map[$0.id]?.aft)}
  aft.map{map[$0.id] = (map[$0.id]?.bef, idx)}
}

for (val, id) in map {
  switch id {
  case (_, nil):  print("\(val): del at \(id.bef!)")
  case (nil, _):  print("\(val): ins at \(id.aft!)")
  default:        print("\(val): mov from \(id.bef!) to \(id.aft!)")
  }
}

//D: del at 3
//E: del at 4
//F: ins at 0
//B: mov from 1 to 3
//A: mov from 0 to 2
//C: mov from 2 to 1

This method is pretty much the same as the other map answers, except it has one fewer loop, and it skips values that are the same in each array. The map here is a dictionary of Strings (the ids in your array) and tuples. The tuples are Ints, corresponding to the index of a given id in your first array, and the index of that same id in the second. The Ints are optional: this is how we figure out what happened to each id. If the first is nil, and the second is not, then that id was inserted. If the second is nil, though, it was removed. And if both Ints are not nil, then that id has been moved from the first to the second.

The way to fill up the map is by looping through the output of the zipWithPadding function, which is here:

func zipWithPadding <
  S0: SequenceType, S1: SequenceType, E0, E1 where
  S0.Generator.Element == E0, S1.Generator.Element == E1
  > (s0: S0, _ s1: S1) -> AnyGenerator<(E0?, E1?)> {

    var (g0, g1) :
    (S0.Generator?, S1.Generator?) =
    (s0.generate(), s1.generate())

    return anyGenerator {
      let e0: E0? = g0?.next() ?? {g0 = nil; return nil}()
      let e1: E1? = g1?.next() ?? {g1 = nil; return nil}()
      return (e0 != nil || e1 != nil) ? (e0, e1) : nil
    }
}

I got it from here. The reason you can't use the standard library zip is that it would terminate once either of the underlying sequences did. Here, though, the sequences are of different length. This zip function returns a generator of tuples of successive elements from its two sequence arguments. If either sequence finishes before the other, the subsequent tuples returned will have that sequence's value as nil. Here's an example:

Array(zipWithPadding([1, 2, 3], [1, 2]))
//[({Some 1}, {Some 1}), ({Some 2}, {Some 2}), ({Some 3}, nil)]

Because generators aren't guaranteed to continually return nil after they've returned nil once (a generator returns nil to signal that it's finished), you can't just keep calling the same generator for your tuple value. That's why the generator itself is set to nil once it returns nil: so that you don't call it anymore.

However, Array generators seem to return nil after the last value anyway. So, if you didn't mind undefined behaviour:

func zipWithPadding <
  S0: SequenceType, S1: SequenceType, E0, E1 where
  S0.Generator.Element == E0, S1.Generator.Element == E1
  > (s0: S0, s1: S1) -> AnyGenerator<(E0?, E1?)> {

    var (g0, g1) = (s0.generate(), s1.generate())

    return anyGenerator {
      let (e0, e1) = (g0.next(), g1.next())
      return e0 != nil || e1 != nil ? (e0, e1) : nil
    }
}

Once you have your generator to loop through, the rest of the idea is easy. The before and after ids are put into a dictionary, and if they weren't in the dictionary already, the corresponding index in the tuple is set to nil. (map[$0.id]?.aft will return nil if $0.id isn't in the dictionary).

In terms of efficiency, there are a few places I think this approach would work. It seems better that it's using only one loop and not two, but the custom zipWithPadding function adds so much overhead the single loop is actually less efficient than two sequential loops. Similarly, using only one enumerate() seems efficient, but again, the overhead isn't worth it. (it's worth noting that if the two arrays were the same length, the standard library zip would give you a very fast option here)

This method does allow you to skip elements that are the same in both arrays, though, something that you can't do with two for loops. In some quick testing, it looks like arrays with more than a quarter of their elements in the same positions are more quickly handled by this method. This would also speed up the looping through the map afterwards: it would only contain differences, so it would be shorter, and you wouldn't need to check for changes.

UPDATE:

I was trying to figure out how to get rid of some of the overhead, especially with regards to the generator. I made a custom struct:

struct PaddedZipGenerator<G0: GeneratorType, G1: GeneratorType> : GeneratorType {

  typealias E0 = G0.Element
  typealias E1 = G1.Element

  typealias Element = (E0?, E1?)

  private var (g0, g1): (G0?, G1?)

  mutating func next() -> PaddedZipGenerator.Element? {
    let e0: E0? = g0?.next() ?? {g0 = nil; return nil}()
    let e1: E1? = g1?.next() ?? {g1 = nil; return nil}()
    return (e0 != nil || e1 != nil) ? (e0, e1) : nil
  }
}

struct PaddedZip<S0: SequenceType, S1: SequenceType> : SequenceType {

  typealias Generator = PaddedZipGenerator<S0.Generator, S1.Generator>

  private let (s0, s1): (S0, S1)

  func generate() -> PaddedZip.Generator {
    return PaddedZipGenerator(g0: s0.generate(), g1: s1.generate())
  }
}

func zipWithPadding<S0: SequenceType, S1: SequenceType>(s0: S0, _ s1: S1) -> PaddedZip<S0, S1> {
  return PaddedZip(s0: s0, s1: s1)
}

And it seems like it worked! With some basic testing it seems like this zipWithPadding function works quite fast. It seems to work faster than two for loops, even when both lists contain none of the same elements.

Aaron Rasmussen · Answer 5 · 2015-06-09T17:41:34.817

Here's what mine would look like:

func deltaFn(before: [(id: String, timestamp: String)], after: [(id: String, timestamp: String)] ) -> ([Int], [Int], [String]) {

    // Get arrays of just the ids...
    let beforeIds = before.map { $0.id }
    let afterIds = after.map { $0.id }

    // Get the inserted and moved indexes...
    let (inserted, moved) = reduce(0..<afterIds.count, (inserted: [Int](), moved: [String]())) { 

        (var changes, index) -> ([Int], [String]) in

        if let beforeIndex = find(beforeIds, afterIds[index])  {
            if beforeIndex != index {
                changes.moved.append("(from: \(beforeIndex), to: \(index))")
            }
        } else {
            changes.inserted.append(index)
        }
        return changes
    }

    // Get the deleted indexes...
    let deleted = reduce(0..<beforeIds.count, [Int]()) { deleted, index in
        return contains(afterIds, beforeIds[index])
            ? deleted
            : deleted + [index]
    }

    // Return them all as a tuple...
    return (inserted, deleted, moved)
}

let (inserted, deleted, moved) = deltaFn(before, after)

println("Inserted: \(inserted)")  // Inserted: [0]
println("Deleted: \(deleted)")    // Deleted: [3, 4]
println("Moved: \(moved)")        // Moved: [(from: 2, to: 1), (from: 0, to: 2), (from: 1, to: 3)]

It works as expected, and is relatively easy on the eyes.

Note that the syntax of the calls to reduce will be different if you are using Swift 2.0. For example,

reduce(0..<afterIds.count, (inserted: [Int](), moved: [String]()))

becomes...

(0..<afterIds.count).reduce((inserted: [Int](), moved: [String]()))

How do I calculate the delta (inserted/deleted/moved indexes) of two lists?

5 Answers5

Linked