-1

Im asking this question in relation to this question posted a few months ago

Currently, in the app I am working, I get series of numbers where numbers may be missing and duplicated but are ordered in ascending manner.

There were two problems.

  1. If there were not duplicates, finding the missing number was pretty easy using the method suggested in the accepted answer of the mention question.

  2. But, if there are duplicates, that approach doesnt work anymore.

How can I solve the problem? No logic seems to work out. And even if it would(using loop), it wouldn't be efficient.

NOTE: I also searched for some libraries but couldn't find any.

Sambhav Khandelwal
  • 3,585
  • 2
  • 7
  • 38
  • 1
    Seems you could simply start by removing duplicates, then apply the previous algorithm. Since numbers are ordered, removing duplicates has the efficiency of O(n) algorithms. – Alexandre Fenyo Feb 20 '23 at 08:56
  • How can I do that? without loops? The data is my app needs to update every 2-3 seconds and so this approach is not good, is it? – Sambhav Khandelwal Feb 20 '23 at 08:57
  • [How to efficiently remove duplicates from an array without using Set](https://stackoverflow.com/questions/17967114/how-to-efficiently-remove-duplicates-from-an-array-without-using-set) – QBrute Feb 20 '23 at 08:58
  • Sorry, but I missed out something. The duplicates are necessary for proper functioning of the app and cant be eliminated. – Sambhav Khandelwal Feb 20 '23 at 08:59
  • 1
    You shouldn't remove the duplicates from your actual payload. You should remove the duplicates from a temporary copy of that data, to make the algorithm work. – QBrute Feb 20 '23 at 09:04
  • 2
    There’s no point removing the duplicates (an O(n) operations) just to be able to optimise finding missing numbers. You’re already iterating through the elements, so you might as well look for the missing number while you do it. I don’t think there’s a way to do better than O(n) in this case. – Tim Moore Feb 20 '23 at 09:33
  • 1
    It really depends on your exact use case. If a few of thousands of numbers are missing, you could store *ranges* instead of each number individually. – MC Emperor Feb 20 '23 at 10:01
  • It sounds like you just need to do a binary search but your phrasing makes it unclear what you are really trying to do. Please include an [mre]. – vsfDawg Feb 20 '23 at 12:30

3 Answers3

1

I just tried to see if breaking down the tasks as much as I can using Fork Join as a fun exercise to get to know the library better (also because I thought dividing the task down into smaller tasks and processing it parallelly would take less time) and contrasted it with a simple for loop.

public class misc {
    public void getMissingNumbers(int[] numbers){
        for (int i=0; i<numbers.length -1; i++){
            int current = numbers[i];
            int next = numbers[i+1];
            if(current+1 != next){
                System.out.println("Problem! - "+current+" "+next);
            }
        }
    }
     
     public static void main(String []args){
         int[] range = IntStream.rangeClosed(1, 50_000_000).toArray();
         int index = 50000;
         range[index] =  range[index-1];  //duplicate
         index = 390;
         range[index] =  range[index-1];
         index = 500390;
         range[index] =  range[index-1];
         index = 2500390;
         range[index] =  range[index-1];
         
         ZonedDateTime now = ZonedDateTime.now();
         misc m = new misc();
         m.getMissingNumbers(range);
         System.out.printf("%s exec time: %dms\n",
                 m.getClass().getSimpleName(),
                 ChronoUnit.MILLIS.between(now, ZonedDateTime.now()));
         
         now = ZonedDateTime.now();
         ForkJoinPool forkJoinPool = ForkJoinPool.commonPool();
         breakDownRecursively bdr = new breakDownRecursively(range);
         forkJoinPool.invoke(bdr);
         System.out.printf("%s exec time: %dms\n",
                 bdr.getClass().getSimpleName(),
                 ChronoUnit.MILLIS.between(now, ZonedDateTime.now()));
     }
}

class breakDownRecursively extends RecursiveAction {
    private final int[] arr;
    private final ArrayList<Integer> arrlst = new ArrayList<>();
    
    public breakDownRecursively(int[] arr) {
        this.arr = arr;
    }
    
    public void compute() {
        int n = arr.length;
        if (arr.length < 2) return;
        int mid = arr.length / 2;

        int[] left = new int[mid];
        System.arraycopy(arr, 0, left, 0, mid);

        int[] right = new int[arr.length - mid];
        System.arraycopy(arr, mid, right, 0, arr.length - mid);

        invokeAll(new breakDownRecursively(left), new breakDownRecursively(right));
        compare(left, right);
    }
    
    private void compare(int[] left, int[] right) {
        if (left.length == 1 && right.length == 1) {
            if (left[0]+1 != right[0]) {
                //System.out.println("Problem! - "+left[0]+" "+right[0]);
            }
        }
    }
}

Output:

Problem! - 390 390
Problem! - 390 392
Problem! - 50000 50000
Problem! - 50000 50002
Problem! - 500390 500390
Problem! - 500390 500392
Problem! - 2500390 2500390
Problem! - 2500390 2500392
misc exec time: 60ms
Problem! - 390 392
Problem! - 500390 500392
Problem! - 2500390 2500392
breakDownRecursively exec time: 2435ms

I suppose I probably made a mistake somewhere during implementation of the fork join, but at the very least you should see that a for loop isn't THAT bad.

and when I used Runnable:

     int mid = range.length/2;
     int[] half1 = new int[mid+1];
     System.arraycopy(range, 0, half1, 0, mid+1);
     int[] half2 = new int[mid];
     System.arraycopy(range, mid, half2, 0, range.length - mid);
     RunnableTask r1 = new RunnableTask(half1);
     RunnableTask r2 = new RunnableTask(half2);
     now = ZonedDateTime.now();
     Thread t1 = new Thread(r1);
     Thread t2 = new Thread(r2);
     
     t1.start();
     t2.start();
     t1.join();
     t2.join();
     
     System.out.printf("%s exec time: %dms\n",
             r1.getClass().getSimpleName(),
             ChronoUnit.MILLIS.between(now, ZonedDateTime.now()));

class RunnableTask implements Runnable{
    private final int[] arr;
    public RunnableTask(int[] arr) {
        this.arr = arr;
    }
    @Override
    public void run() {
        // TODO Auto-generated method stub
        for (int i=0; i<arr.length -1; i++){
            int current = arr[i];
            int next = arr[i+1];
            if(current+1 != next){
                System.out.println("Problem! - "+current+" "+next);
            }
        }
    }
    
}

Output:

Problem! - 390 390
Problem! - 390 392
Problem! - 50000 50000
Problem! - 50000 50002
Problem! - 500390 500390
Problem! - 500390 500392
Problem! - 2500390 2500390
Problem! - 2500390 2500392
RunnableTask exec time: 49ms

Only slightly better than a for loop.

  • Thanks for the answer @experimentunit1998X. I was unable to understand the second approach that you used. Also, I was unable to understand the output. For example `Problem! - 390 390`, the first statement in the output, why are both of the numbers same? – Sambhav Khandelwal Feb 20 '23 at 15:58
  • 1
    Did you read the code? They created an array of 50 mil elements in the top of main. `System.out.println("Problem! - "+current+" "+next);` You can see that the current array element (`390`) is equal to the next array element (`390`). Or the current array element (`390`) is not equal to 1 less than the next array element (`392`). therefore, there is a missing number (`391`). The point of this code is to show that a loop finished in 60ms, and a Runnable finished in 49ms. You seem to be of the mindset that a loop is bad. They are proving it isn't. – tbatch Feb 20 '23 at 16:57
  • I probably messed up with the second approach, but what I really wanted to do was something like using the runnable, split it up into lists of certain length and process each. The problem with the second approach I was not able to resolve is that at the end of the day, it will "merge" back together the broken up parts and will process already processed array elements – experiment unit 1998X Feb 21 '23 at 01:07
  • the problem is that the current element and the next are duplicates, and on the next iteration, the current element is not 1 less than the next element, so the two errors come one after another – experiment unit 1998X Feb 21 '23 at 01:08
  • I get it now. Sorry for the confusion. I have upvoted you answer but feel that I cant yet accept it as I find another answer which does the same task in comparatively less time. But, your program uses less memory compared to that one. Thanks for the answer! – Sambhav Khandelwal Feb 21 '23 at 16:31
1

A binary search benefits from being able to cut a problem space in half, and then eliminating one of the halves. In this case, any half that contains both a missing value and a duplicate is indistinguishable from one that doesn't, no matter how many additional duplicates exist, so you'd end up having to process both halves.

Millions of integer comparisons requires very little compute time. A linear solution would still be very fast, and in this case, is as efficient as you can be on a worst-case basis.

I ran the code multiple times below on my desktop, and came up with an average of about 5ms to process an array of 10 million elements, and in all cases, it found the results in under 10ms.

public class Millions {

    public static int[] fillArray(int size) {
        int[] ar=new int[size];
        int randomPos=(int)(Math.random()*size);
        System.out.println("Placing missing value at position " + randomPos);
        int nextNum=1;
        for (int i=0; i<size; i++) {
            if (i==randomPos) {
                nextNum+=2;
            } else {
                if (Math.random() > 0.999995) {
                    System.out.println("Placing duplicate value at position " + i);
                } else {
                    nextNum++;
                }
            }
            ar[i] = nextNum;
        }
        return ar;
    }

    public static int missingValue(int[] ar) {
        for (int i=1; i<ar.length; i++) {
            if (ar[i]-ar[i-1]==2) return ar[i]-1;
        }
        return -1;
    }

    public static void main(String[] args) {
        int SIZE=10000000;
        int[] ar=fillArray(SIZE);
        long start=System.currentTimeMillis();
        int missing=missingValue(ar);
        long duration=System.currentTimeMillis()-start;
        if (missing<0) {
            System.out.println("No missing value found.");
        } else {
            System.out.println("Missing value = " + missing);
        }
        System.out.println("Duration : " + duration + " ms");
    }
}
phatfingers
  • 9,770
  • 3
  • 30
  • 44
  • Thanks for the answer @phatfingers. I run it on a online java compiler and found something fishy. The first statement I find `Placing missing value at position 8534196` which is pointing to index number 85,34,196 but `Missing value = 55`. How can index 8534196 be 55?? I couldnt quite get it. Sorry if its a silly thing to ask but I couldn't get onto it. – Sambhav Khandelwal Feb 21 '23 at 15:23
  • You found a bug. I just fixed it. – phatfingers Feb 21 '23 at 16:22
  • Now it works fine. But, the memory it takes is 73948 kilobytes when running on jdooddle.com. Its speed is good tho going from 2 milliseconds to 29 milliseconds – Sambhav Khandelwal Feb 21 '23 at 16:27
  • Can it be made more efficient to the memory? – Sambhav Khandelwal Feb 21 '23 at 16:28
  • If you're receiving the data as a stream, then yes. You don't need the whole dataset all at once just to compare the most recent 2 values. – phatfingers Feb 21 '23 at 16:33
0

As far as I know, there is no way to detect a missing number in a list except for looping through.

If your array is sorted, it should look like something like this:

[1,2,3,3,4,6]

So, this code should do the work:

int getMissingNumber(int[] numbers){
    for (int i=0; i<numbers.length -1; i++){
        int current = numbers[i];
        int next = numbers[i+1];
        if(next - current > 1){
            return current + 1;
        }
    }
    return -1;
}

Other than this, there is the option of changing the Array to the Set and then back to Array and then, use the previous approach. Be sure to use LinkedHashSet to preserve insertion order. But I don't know if it would be any faster.

Ali
  • 117
  • 9