What is the quickest way to group an array of double pairs

Question

Input is 200 K doubles pairs i.e. keypair. they have no range i.e. keys can be any numbers.

Now, the task is to group such big data into the following:

key1 => (value1, value2, value3)
key2 => (value1, value2, value3)
...
...

a EPSILON (e.g. 1e-8) can be considered when comparing equality of key (doubles).

I have developed a O(n^2) solution but it is too slow for 200K doubles. wondering any improvements? such as O(nlogn) would be nice.

Another example,

Input:  <0.1, 0.3>,  <0.1, 0.4>,  <0.2, 0.5>, <0.2, 0.6>, <0.3, 0.7>
Output

0.1 => (0.3, 0.4)
0.2 => (0.5, 0.6)
0.3 => (0.7)

How would you handle the case that you have N values that are all within EPSILON of some other value in the consideration set, e.g. (1.0, 1.0 + EPSILON, 1.0 + 2xEPSILON, 1.0 + 3xEPSILON)? — Eric J., May 20 '14 at 19:48
So you could have 1,000,000 numbers following that pattern, and they would all have the same key? But if you had (1.0, 1.0 + 2xEPSILON, 1.0 + 4xEPSILON, ...) they would have different keys? Does not make sense to me. — Eric J., May 20 '14 at 19:51

score 1 · Answer 1 · answered May 20 '14 at 19:49

1

Why not sort? Sort according to first value and you are (almost) done. It is O(nlogn).

answered May 20 '14 at 19:49

ElKamina

7,747
28
43

score 1 · Accepted Answer · edited May 23 '17 at 11:50

1

In order to avoid the issue of the grouping key depending on other grouping keys - the issue of treating keys like

(1.0, 1.0 + EPSILON, 1.0 + 2xEPSILON, 1.0 + 3xEPSILON)

consistently with keys like

(1.0, 1.0 + 2xEPSILON, 1.0 + 4xEPSILON, ...)

the most logical choice seems to be to use a HashSet and create a hash key by quantizing the actual double value of the key into buckets EPSILON in size.

Depending on your requirements for EPSILON, you might adopt the discussion below to quantize your expected input range into something like a long:

Convert/Quantize Float Range to Integer Range

edited May 23 '17 at 11:50

Community

1
1

answered May 20 '14 at 19:53

Eric J.

147,927
63
340
553

thanks.. I have changed my algorithm to O(n) using hash table, works like a charm... lighting fast! – justyy May 21 '14 at 10:36

What is the quickest way to group an array of double pairs

2 Answers2