1

I am working on an assignment and don't get answer for some of questions.

I Have been asked :

Input: an array A of length N that can only contain integers from 1 to N

Output: TRUE - A contains duplicate, FALSE - otherwise.

I have created a class which is passing my test cases.

public class ArrayOfIntegers {


public boolean isArrayContainsDuplicates(int [] intArray){
    
    int arraySize = intArray.length;
    
    long expectedOutPut = arraySize*(arraySize+1)/2;
            
    long actualOutput = 0;
    
    for(int i =0 ; i< arraySize; i++){
        
        actualOutput =  actualOutput + intArray[i];
        
    }
            
    if(expectedOutPut == actualOutput)
        return false;
    
    return true;
}   

}

Now further questions on this

  1. Is it possible to provide the answer and NOT to destroy the input array A?

    I have not destroy an array. So what I have done is correct?

  2. Analyze time and space complexity of your algorithm?

    So do I need to write something about the for loop that as soon as I find the duplicate elements I should break the loop. Frankly speaking I am not very clear about the concept of time and space complexity.

  3. Is O(n) for both time and space possible?

    is this should be No as n could be any number. Again , I am not very clear about O(n).

Thanks

Community
  • 1
  • 1
Harry
  • 4,705
  • 17
  • 73
  • 101
  • 1
    So many "check for duplicates in an array" homework questions, so little time. – Robert Harvey Jun 20 '12 at 22:57
  • 1
    You may be missing an assumption: Your array can contain elements from 1 to N, but what if in a list of 5 elements, they skipped values? EX: {1, 2, 2, 4, 5}. – Makoto Jun 20 '12 at 22:57
  • Micro optimisation actualOutput = actualOutput + intArray[i]; can be written as actualOutput += intArray[i]; – n00begon Jun 20 '12 at 22:58
  • 1
    @Makoto: +1 In fact, if that is not the case, there will be no duplicate. – Thilo Jun 20 '12 at 22:58
  • @Thilo: I was just thinking that. How would you have a duplicate if every value was consecutive? Unless you ran into an array of the same value no larger than N. – Makoto Jun 20 '12 at 22:59
  • @RobertHarvey At least he has a solution to the problem, and a concrete question: "Can you help me understand algorithmic complexity?" Here's a start: http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o – dlev Jun 20 '12 at 22:59
  • 1
    What you're doing right now is adding all the elements and seeing if the result is what you would expect (n(n+1)/2). This will not always work; try the test case {1, 2, 2, 5, 5} to see. – arshajii Jun 20 '12 at 23:04
  • Are we allowed to help a "homework" tagged question with the code? – JHS Jun 20 '12 at 23:06
  • 1
    @Juniad: See [How to ask and answer homework questions?](http://meta.stackexchange.com/q/10811/175248) – Makoto Jun 20 '12 at 23:07
  • @Makoto - The answer in the link u shared was too long... But I guess I got the hint... ;-) – JHS Jun 20 '12 at 23:12

5 Answers5

2

Is it possible to provide the answer and NOT to destroy the input array A?

Yes. For example, if you don't care about the time it takes, you can loop over the array once for every possible number and check if you see it exactly once (if not, there must be a duplicate). That would be O(N^2).

Usually, you would use an additional array (or other data structure) as a scratch-list, though (which also does not destroy the input array, see the third question below).

Analyze time and space complexity of your algorithm?

Your algorithm runs in O(n), doing just a single pass over the input array, and requires no additional space. However, it does not work.

Is O(n) for both time and space possible?

Yes.

Have another array of the same size (size = N), count in there how often you see every number (single pass over input), then check the counts (single pass over output, or short-circuit when you have an error).

So do I need to write something about the for loop that as soon as I find the duplicate elements I should break the loop.

No. Complexity considerations are always about the worst case (or sometimes the average case). In the worst case, you cannot break out of the loop. In the average case, you can break out after half the loop. Either way, while being important for someone waiting on a real-life implementation to finish the calculation, this does not make a difference for scalability (complexity as N grows infinite). Constant offsets and multipliers (such as 50% for breaking out early) are not considered.

Thilo
  • 257,207
  • 101
  • 511
  • 656
0

As hints:

  1. Yes, it is possible to provide an answer and not destroy the array. Your code* provides an example.

  2. Time complexity can be viewed as, "how many meaningful operations does this algorithm do?" Since your loop goes from 0 to N, at minimum, you are doing O(N) work.

    Space complexity can be viewed as, "how much space am I using during this algorithm?" You don't make any extra arrays, so your space complexity is on the order of O(N).

  3. You should really revisit how your algorithm is comparing the numbers for duplicates. But I leave that as an exercise to you.

*: Your code also does not find all of the duplicates in an array. You may want to revisit that.

Community
  • 1
  • 1
Makoto
  • 104,088
  • 27
  • 192
  • 230
  • @Makoto - The space complexity is not O(n). – Travis J Jun 20 '12 at 23:30
  • I disagree. 4*N is the minimum amount of space you're using with every array location (not counting references), and that would scale linearly with N. – Makoto Jun 20 '12 at 23:32
  • @Makoto - If you made an extra array of size N then the space complexity would be O(n). However, since there is no space required for the algorithm to run aside from the long variables and int variables, the space complexity is only as big as those four variables will get. This size will not be nearly as large as the size of the original array which is not contained in the algorithm and therefore does not affect the space complexity. – Travis J Jun 20 '12 at 23:40
0
public boolean hasDuplicates(int[] arr) {
    boolean found = false;
    for (int i = 1 ; i <= arr.length ; i++) {
        for (int a : arr) 
            if (a == i) found = true;
        if (! found) return true;
    }
    return false;
}

I believe this method would work (as yours currently doesn't). It's O(n^2).

I'm quite sure that it is impossible to attain O(n) for both time and space since two nested for-loops would be required, thereby increasing the method's complexity.

Edit

I was wrong (sometimes it's good to admit it), this is O(n):

public boolean hasDuplicates(int[] arr) {
    int[] newarr = new int[arr.length];
    for (int a : arr) newarr[a - 1]++;
    for (int a : newarr) if (a != 1) return true;
    return false;
}
  1. Yes, the input array is not destroyed.
  2. The method directly above is O(n) (by that I mean its runtime and space requirements would grow linearly with the argument array length).
  3. Yes, see above.
arshajii
  • 127,459
  • 24
  • 238
  • 287
  • There may be a way to do it without nested for-loops, but you'd have to get really tricky. Non-nested for-loops would give you a complexity of O(k*N), for some number of loops k. – Makoto Jun 20 '12 at 23:20
  • Ah I think i figured out a way to do it without the nested loops – arshajii Jun 20 '12 at 23:21
  • Hi A.R.S Thanks for your answer but this is not working for {1,6,4,2,5} – Harry Jun 20 '12 at 23:37
  • 1
    That's an invalid case according to your problem description. Shouldn't the array contain only integers 1 - N? – arshajii Jun 20 '12 at 23:38
  • By the way it could be made potentially faster if you replace "(a > 1)" in the if-statement with "(a != 1)". – arshajii Jun 20 '12 at 23:41
0

It's possible by adding all of the elements to a hashset = O(n), then comparing the number of values in the hashset to the size of the array = O(1). If they aren't equal, then there are duplicates.

Creating the hashset will also take up less space on average than creating an integer array to count each element. It's also an improvement from 2n to n, although that has no effect on big-O.

WLPhoenix
  • 294
  • 4
  • 17
-1

1) This will not require much effort, and leaves the array intact:

 public boolean isArrayContainsDuplicates(int [] intArray){
  int expectedOutPut = (intArray.length*(intArray.length+1))/2;
  int actualOutput = 0;
  for(int i =0 ; i < intArray.length; i++){
   if(intArray[i]>intArray.length)return true;
   actualOutput += intArray[i];
  }
  return expectedOutPut == actualOutput ? false: true;
 }

2) This will require touching a varying amount of elements in the Array. Best case, it hits the first element which would be O(1), average case is it hits in the middle O(log n), and worse case is it goes all the way through and returns false O(n).

O(1) refers to a number of operations which are not related to the total number of items. In this case, the first element would be the only one which has this case.

O(log n) - log is a limiting factor which can be a real number from 0 to 1. Thus, multiplying by log will result in a smaller number. Hence, O(log n) refers to a required amount of operations which are less than the number of items.

O(n) - This is when it required a number of operations equal to the number of items.

These are all big-o notation for the required time.

This algorithm uses memory which will increase as n increases. However, it will not grow linearly but instead be a fraction of the size n is and therefore its spacial Big-O is O(log n).

3) Yes, this is possible - however, it is only possible in best-case scenarios.

Travis J
  • 81,153
  • 41
  • 202
  • 273
  • Your code will fail on `{1, 2, 2, 4, 5}`, since none of those elements are greater than 5. – Makoto Jun 20 '12 at 23:14
  • Nope. In addition, your assumption about memory usage is off; you're not using a constant amount of memory for some N. – Makoto Jun 20 '12 at 23:23
  • @Makoto - You are correct, it should be log n as it is not going to scale linearly but it will be varying based on n. What set would break the method though? – Travis J Jun 20 '12 at 23:24
  • Why would the space complexity be log(N)? The set that breaks the method could be any non-linear set - `{1, 1, 1, 1, 1}`. The way you check to see what you expect is by doing a summation on N, and 5 != 15. – Makoto Jun 20 '12 at 23:26
  • @Makoto - 5 != 15 therefore the method returns `true`, as in there are duplicates. – Travis J Jun 20 '12 at 23:27
  • @Makoto - The space complexity will not grow linearly with n. In fact, if n were 1,000,000 then the memory would not be 1,000 times greater than if n were 1,000. This is a log n relationship. – Travis J Jun 20 '12 at 23:28
  • Your use of the ternary statement threw me off; why not `return !(expectedOutPut == actualOutput)`? So that much is correct. Anyway, there's a difference between holding an array of 100 ints, an array of 1,000 ints, and an array of 1,000,000 ints. Every space in the array of type int adds an extra 4 bytes to your memory usage at least. It **isn't** a logarithmic relation. – Makoto Jun 20 '12 at 23:31
  • @Makoto - There is no memory to this algorithm aside from the original set, and the three variables used (the three ints). The original set is **not** considered as a requirement for space complexity as it is already in memory. – Travis J Jun 20 '12 at 23:33
  • I'm not buying it, but I don't want this comment chain to get too much longer. It'd be nice if you could provide a proof or a reference in your post as to where you get that value from. – Makoto Jun 20 '12 at 23:39
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/12828/discussion-between-makoto-and-travis-j) – Makoto Jun 20 '12 at 23:42