0

I have two arrays. A contains an ordered list of elements, such as [e1, e2, e2, e3, e4, e5, e5]. B is a subset of A, with repeats, such as [e1, e1, e2, e5, e4]. In practice the arrays will be larger than this (probably under 10,000 elements in length), and performance matters.

How can I quantitatively determine how similar the order of elements in the two arrays are? (Ideally without just brute forcing a comparison)

Jones
  • 1,154
  • 1
  • 10
  • 35
  • So you need to make each of them *a set* (a collection of unique elements), and then if two sets are same - they are equal - and removing repetitions like in https://stackoverflow.com/questions/3350641/array-remove-duplicate-elements – dmitryro Jun 27 '18 at 15:36
  • See https://en.wikipedia.org/wiki/Edit_distance for a bunch of popular metrics for strings, but they could be directly applied to arrays – c2huc2hu Jun 27 '18 at 15:44
  • @dmitryro I tried to make this clear in the question, but I've edited it to be more specific. I need to compare the order. – Jones Jun 27 '18 at 15:46
  • @Jones is it possible to provide a sample input and how would the desired output will look ? – zenwraight Jun 27 '18 at 16:09
  • @user3080953 Thanks, I wasn't sure what to google, but edit distance looks like what I'm looking for. – Jones Jun 27 '18 at 16:13
  • You may want to look at [dynamic time warping](https://en.m.wikipedia.org/wiki/Dynamic_time_warping), a variation on edit distance, if repetitions are allowed. – templatetypedef Jun 27 '18 at 16:28
  • @templatetypedef Yes, this is exactly what I was looking for. "Dynamic Time Warping" what a scary name. Can you post this as an answer? – Jones Jun 27 '18 at 17:30

2 Answers2

1

First, you need to verify that each element of B belongs to A. Since A is a sorted array, this is a log(n) operation (using binary search) which needs to be repeated n times at most and that makes a nxlog(n) operation.

Second, you need to verify that elements of B are sorted or count the number of swaps needed to sort them. This is again a nxlog(n) operation using e.q. quicksort or any other efficient algorithm.

Trifon
  • 1,081
  • 9
  • 19
1

If you're looking to see how similar two different sequences of data are, given that there might be a lot of repeated copies of the same data, you might want to check out dynamic time warping, an algorithm that measures this quantity. It has applications in speech recognition, time-series data analysis, and string similarity.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065