2

This a new version of this post in order to isolate the programming question from the probability question.

I want to store a number of e.g. 25 randomly generated numbers between 1 and 365 in an array. But i need to keep track of duplicates. Here's how i was thinking of doing it:

  • create 4 arrays: a master array, an array for 2 duplicates, an array for 3 duplicates and one for more than 3 duplicates

  • add each generated number one by one into the master array. But before doing so, loop through the array to see if it's in there already. If so, add it to the second array, but before doing so repeat the above process and so on

at the end of the process i could count the non-null values in each array to know how many unique numbers i have, how many came up twice etc

it doesn't seem to be a very efficient algorithm. Any suggestions to improve it?

Can my suggested approach be considered to be Big O(n) i.e. linear?

Community
  • 1
  • 1
  • Can you elaborate on "keep track of duplicates"? What information do you want to extract from this data? – Blorgbeard Feb 15 '11 at 09:51
  • Eg. are you estimating the probability that two random people share a birthday? – Blorgbeard Feb 15 '11 at 09:53
  • @Blorgbeard, i have reworded the post. i will generate 25 random numbers bewtween 1 and 365. I need to keep track of any number being generated twice or three times ore more. Of course i won't always get duplicates, and even less likely triplicates but i might...so i want to capture / count such occurrences –  Feb 15 '11 at 10:10
  • Do you need to preserve the order the numbers had been generated in? – biziclop Feb 15 '11 at 10:34
  • Also I believe your algorithm is technically O(1), but very inefficient. Since there are at most 365/6 distinct birthdays, each of your arrays will have a maximum size of 365 elements. Traversing an array of constant size is O(1) because it doesn't depend on the number of birthdays generated. That doesn't mean it's fast though. If you know you'll always generate < 365 random birthdays and want to analyse the algorithm in those terms, I think the complexity is O(n^2) - each new birthday is O(n), and you need to make 'n' of them. – John L Feb 15 '11 at 11:54

4 Answers4

2

Why are you using arrays? A HashMap or other map structure would seem to make more sense. Here's how I would do it.

  1. Instantiate a new, empty hashmap from birthdays to integers
  2. Generate a random birthday.
  3. Check if the birthday is in the hashmap. If it isn't, add it with a value of '1'. It it is, increment the value at that birthday.

Now you can get the number of unique dates generated by the number of keys in the hashmap, and any information about the number of duplicates from the values.

John L
  • 27,937
  • 4
  • 73
  • 88
  • thanks for the suggestion. I will need to learn about Hash Tables. found a good starting point on SO: http://stackoverflow.com/questions/2489169/hashtable-implementation-tutorial –  Feb 15 '11 at 10:50
  • @All, i will accept the answer that suggests the most computationally efficient approach. Will need to take some time to evaluate answers. Thanks for the participation so far! –  Feb 15 '11 at 10:51
  • I'd use a `HashMap` myself, but you can do it with a `Hashtable` if you prefer. And I wouldn't pre-initialise anything, just check every time a new number is generated if it's in the map already. If it is, increment its count, if there isn't, insert it with count 1. – biziclop Feb 15 '11 at 10:52
  • @biziclop - edited to suggest hashmap instead of hashtable; my Java's a bit rusty. And by `initialize` I just meant to instantiate the object, which I thought was clear from the "empty". – John L Feb 15 '11 at 10:54
  • @Baba - it seems that `HashTable` is deprecated, you should study `HashMap` instead (the accepted answer from your link covers this). Basically, you need to know that it's a map from keys (birthdays here) to values (here, the number of occurrences). You can look up the value by a given key, enumerate over all keys/values, and do a few other useful operations. – John L Feb 15 '11 at 10:59
1

Here is how I think the problem can be solved.

  1. Have two arrays: a master one containing the birthdays and one containing the number of times it has repeated.
  2. Generate a random birthday.
  3. Loop through the main array and see if it is already there.
  4. If it is not there, add it to the main array and in the same index position, store 1 in the other array. ( If you add the birthday to master[3], store 1 in number[3])
  5. If it is already there, add one to the corresponding index in the other array.
  6. Now your master array will contain the generated birthdays and the corresponding indexes in the other array will have the number of times it is repeated. (master[0] has a date and number[0] has the number of times the date is repeated).

Hope this helps.

Harish
  • 228
  • 1
  • 7
  • I like it - simple and efficient. Conducive to clear code which is good for homework (which I believe this question is..) – Blorgbeard Feb 15 '11 at 10:34
  • @Harish, yes that sounds much better than my in initial idea. thanks for that! any issues with the fact that some array positions may remain unpopulated? @Blogbeard, you're right, i have updated the post tags accordingly –  Feb 15 '11 at 10:37
  • @Baba, first populate both your arrays with zero and then start your process. It will be easier that way. – Harish Feb 15 '11 at 10:54
  • @SO people,if you have solved it,can you please post the working example so that it can be handy for further reference – Deepak Feb 18 '11 at 16:44
  • @Deepak, there is no ONE solution to this question. all suggestions posted here work. some are better than others in terms of efficiency. –  Feb 19 '11 at 14:25
  • @all, I will accept this answer as it gave me the quickest solution to my query. I will analyse the other answers to improve my knowledge of problem solving so thanks very much to all for your input –  Feb 24 '11 at 14:28
0

in ruby:

birthdays = {}.tap{|h| h.default = 0}
25.times { birthdays[rand(364)+1] += 1 }

puts birthdays.keys # all the uniq birthdays
puts birthdays.inspect # all birthdays with counts

idea is to use hash or fixed size array where keys correspond to birthdays and values are how many times that birthday was selected

problem is that you will lose order of birthdays as they are just keys in hash/array but if original order is not important you can always get random sequence of birthdays by doing

birthdays.keys.shuffle
keymone
  • 8,006
  • 1
  • 28
  • 33
0

Depending on how many times you will need to generate the random list you might consider creating a list of all the numbers and then using a shuffle sort to permute them, you can then pull consecutive blocks off the array which will be both in random order and unique. So in the worst case the performance would be O(size all numbers)+O(size of numbers needed), and best case is O(size of numbers needed). An example:

private int[] numbers;
private int numberIndex=Integer.MAX_VALUE;
private void initNumbers(){
  int[] numbers = new int[365];
  for(int i=0;i<365;i++){
    numbers[i] = i+1;
  }
  shuffleSort(numbers);
  numberIndex = 0;
}

public int[] getRandom(int howMany){
  if(numberIndex > numbers.length-howMany){
    initNumbers();
  }
  int[] ret = new int[howMany];
  for(int i=0;i<howMany;i++){
    ret[i] = numbers[numberIndex++];
  }
  return ret;
}

private void shuffleSort(int[] numbers){
  Random r = new Random();
  int tmp=0, swap=0;
  for(int i=numbers.length-1;i>0;i--){
    swap = r.nextInt(i);//random number between 0 and i
    tmp = numbers[i];
    numbers[i] = numbers[swap];
    numbers[swap] = tmp;
  }
}

}

M. Jessup
  • 8,153
  • 1
  • 29
  • 29