How to how similar the sequence of elements in two arrays are

Question

I have two arrays. A contains an ordered list of elements, such as [e1, e2, e2, e3, e4, e5, e5]. B is a subset of A, with repeats, such as [e1, e1, e2, e5, e4]. In practice the arrays will be larger than this (probably under 10,000 elements in length), and performance matters.

How can I quantitatively determine how similar the order of elements in the two arrays are? (Ideally without just brute forcing a comparison)

So you need to make each of them *a set* (a collection of unique elements), and then if two sets are same - they are equal - and removing repetitions like in https://stackoverflow.com/questions/3350641/array-remove-duplicate-elements — dmitryro, Jun 27 '18 at 15:36
See https://en.wikipedia.org/wiki/Edit_distance for a bunch of popular metrics for strings, but they could be directly applied to arrays — c2huc2hu, Jun 27 '18 at 15:44
@dmitryro I tried to make this clear in the question, but I've edited it to be more specific. I need to compare the order. — Jones, Jun 27 '18 at 15:46
@Jones is it possible to provide a sample input and how would the desired output will look ? — zenwraight, Jun 27 '18 at 16:09
@user3080953 Thanks, I wasn't sure what to google, but edit distance looks like what I'm looking for. — Jones, Jun 27 '18 at 16:13
You may want to look at [dynamic time warping](https://en.m.wikipedia.org/wiki/Dynamic_time_warping), a variation on edit distance, if repetitions are allowed. — templatetypedef, Jun 27 '18 at 16:28
@templatetypedef Yes, this is exactly what I was looking for. "Dynamic Time Warping" what a scary name. Can you post this as an answer? — Jones, Jun 27 '18 at 17:30

score 1 · Answer 1 · answered Jun 27 '18 at 16:25

First, you need to verify that each element of B belongs to A. Since A is a sorted array, this is a log(n) operation (using binary search) which needs to be repeated n times at most and that makes a nxlog(n) operation.

Second, you need to verify that elements of B are sorted or count the number of swaps needed to sort them. This is again a nxlog(n) operation using e.q. quicksort or any other efficient algorithm.

score 1 · Accepted Answer · answered Jun 27 '18 at 21:26

If you're looking to see how similar two different sequences of data are, given that there might be a lot of repeated copies of the same data, you might want to check out dynamic time warping, an algorithm that measures this quantity. It has applications in speech recognition, time-series data analysis, and string similarity.

How to how similar the sequence of elements in two arrays are

2 Answers2