1

I'm working on a project where I have to receive a big amount of records (app. 20K), each of which represents a point (x,y) which are in decimal points. I have the Point object, and a double value m = user input I need to eliminate all points which has another point closer than m to it, for example if m = 0.1 and p1 = {1.21,1.32}, p2 = {1.21,1.31} p3 = {1.20, 1.32} p4 = {1.55, 1.31} I need to eliminate p2, p3 (as close points to of p1) but I'll keep p4 as it's distance with any of other points is bigger than 0.1.

I implemented an algorithm, but it takes more than 3 hours to check this (for 20K of record, which I think to be ridiculous, is there is any way to do this using .NET framework up to 4.5 ?

Community
  • 1
  • 1
Mo-
  • 155
  • 15
  • 2
    20K amount of decimal points doesn't look to me big at all and it should not take 3 hours to complete, even with brut-force solution. – Tigran Feb 21 '17 at 23:25
  • 2
    as much as I can understand from the question, there should be something terribly wrong with implementation, so it worth to show it.at least sniplet, or basic logic scheme, if it is too big for some reason. – Tigran Feb 21 '17 at 23:27
  • @Tigran I'm not sure, https://paste.ofcode.org/hkb5YiKASmyFZsQ9ZxhuHS – Mo- Feb 21 '17 at 23:36

1 Answers1

1

Here's a few things to try

  1. Remove the Console.WriteLine statements. Outputting 20K x 20K = 400M rows to the console, all by itself, will take hours and hours, even if the program does nothing else. If you absolutely have to keep some sort of output, you can save a lot of processing time by outputting to the same line instead of scrolling.

  2. Consider sorting your list prior to looping through it, and modifying your loops so that you only need to compare items that are near one another in the sorted list. For example, if you sort by Y, your outer loop will remain the same but you can replace the inner for statement with a while (full[j].Y < full[i].Y + maxVal). Once you've reached a part of your list where no element could possibly be within maxVal, you can exit the inner loop and move on to the next value of i. This will change your performance profile from O(N^2) to O(N)... way better.

  3. If you don't need more than seven significant digits, consider using float instead of double, which will speed up the math quite a bit.

  4. Consider pre-allocating memory for duplicated. Whenever that list grows beyond its allocated size, .NET will have to allocate a new list (possibly triggering garbage collection) and copy all the bytes from the old one. You can pre-allocate space using this sort of syntax:

    var list = new List<Point>(20000);
    
Community
  • 1
  • 1
John Wu
  • 50,556
  • 8
  • 44
  • 80
  • 1
    Using #2 only reduced time to nearly 3 seconds, with single thread :-) great solution – Mo- Feb 22 '17 at 15:11