0

I have several lists of varying size, each index of the list contains both a key and an object : list1.add('Key', obj).

The lists are all sorted.

My aim is to iterate through the list and match 1 or more items in list 2,3,4 or n to an item in the mainList using the key.

Currently I do something along the lines of:

for i to list1 size
   for j to list2 size
   if list1[i] = list2[j]
      do stuff 

As I loop through I'm, using boolean values to exit quickly using a if current != previous check and I'm deleting the object from the list I take it from.

it's working fine but I now have another list that I need to match and possible another n lists afterwards. The lists are of different sizes.

the 2 options that I can see are to either repeat the above segment of code several times where the inner list is changed - I do no like this approach.

The other option is to extend the above and once one inner loop is finished, move onto the next:

  for i to list1 size
       for j to list2 size
       if list1[i] = list2[j]
          do stuff 
       for k to list2 size
       if list1[i] = list2[k]
          do stuff 

I'd like to think I'm correct in thinking that the 2nd is more efficient however I'm unsure. Also, is there a better way?

Thanks for any advice / help.

null
  • 3,469
  • 7
  • 41
  • 90
  • They both have the same time complexity. The first option is `O(i*j) + O(i*k)`, the second is `O(i*(j+k))`. – Barmar Dec 04 '14 at 11:35
  • @Barmar : Thank you.Is the complexity n^2 then? I thought that seeing as on some occasions it would be constant that it might improve but, complexity analysis is not my strong point :) – null Dec 04 '14 at 11:38
  • This question appears to be off-topic because cs.stackexchange.com would be a better place for questions like this. – Barmar Dec 04 '14 at 11:40
  • @Barmar : thanks, I didn't realise it would be considered off topic. I'll move it now unless you want to post an answer as per the comment – null Dec 04 '14 at 11:41

1 Answers1

1

If the lists are all sorted then you only need to iterate though each list once; on each iteration of the main list, iterate through a secondary list (starting at the previously saved index, initialized to 0) until you find an index whose value is greater than the current value of the main list, save this index, and proceed to the next secondary list.

Array<Integer> indices = new Array(n-1); // indices for every list except list1
for(int i = 0; i < indices.size; i++) {
  indices[i] = 0;
}

for(int i = 0; i < list1.size; i++) {
  Value curVal = list1[i];
  while(indices[0] < list2.size && list2[indices[0]] <= curVal) {
    if(list2[indices[0]] == curVal) {
      // do stuff on list2[indices[0]]
    }
    indices[0]++;
  }
  while(indices[1] < list3.size && list3[indices[1]] < curVal) {
    if(list3[indices[1]] == curVal) {
      // do stuff on list3[indices[1]]
    }
    indices[1]++;
  }
  // etc
}

You can avoid the copy-pasting by using something like a ListIterator that contains a list and its current index; then on each iteration of the main loop you'll iterate through a list of ListIterators in lieu of the copy-pasted code block

public class ListIterator {
  int index = 0;
  List<Value> list;
  Value curVal() {
    return list[index];
  }
  boolean hasNext() {
    return (index < list.size);
  }
}

List<ListIterator> iterators;
for(int i = 0; i < list1.size; i++) {
  Value curVal = list1[i];
  for(int j = 0; j < iterators.size; j++) {
    ListIterator iterator = iterators[j];
    while(iterator.hasNext() && iterator.curVal() <= curVal) {
      if(iterator.curVal() == curVal) {
        // do something with iterator.curVal()
      }
      iterator.index++;
    }
  }
}

This is time complexity O(n) where n is the sum of the lengths of all of your lists


Edit: If it's difficult to compare keys via <=, then you can use a Set implementation instead. Add the List1 keys to a Set, then iterate through the remaining lists testing for set membership.

Set<String> set = new Set(List1);
Array<List> lists = new Array();
// add lists to Array<List>

for(int i = 0; i < lists.size; i++) {
  List currentList = lists[i];
  for(int j = 0; j < currentList.size; j++) {
    if(set.contains(currentList[j]) {
      // do something
    }
  }
}
Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69
  • Hi, I was just looking at implementing your solution. Unfortunatley I do not have access to iterators; the joy of an old version of Delphi. Where you do '&& list2[indices[0]] <= curVal' is where I'm currently failing. The key that I have(curVal) is alphanumeric and not sequential. By that I mean it's ordered but there are gaps for example I have ABC765 and AC3223 that are next to each other. Is there a way to use this type of key in your solution? – null Dec 12 '14 at 12:26
  • @SteveGreen How are the keys ordered? Given your example, it looks like it may be "if the alpha prefix of key1 is less than the alpha prefix of key2 then key1 < key2, else if the alpha prefixes are equal and the numeric suffix of key1 is less than the numeric suffix of key2 then key1 < key2", in which case `list2[indices[0]] <= curVal` would become a more complicated `isLessThan(list2[indices[0]], curVal)` where you split and compare the keys – Zim-Zam O'Pootertoot Dec 12 '14 at 14:28
  • @SteveGreen If it's not feasible to write an `isLessThan` method, then if you have access to a `Set` data structure (this can be implemented as a `Map` or `Dictionary` with null values) then you can add all of the List1 keys to the set, then iterate through all of the other lists, testing for set membership. `list2[indices[0]] <= curVal` becomes `set.contains(list2[i])` - you no longer need `indices` since you can iterate through each secondary list all at once. This is still O(n) on average – Zim-Zam O'Pootertoot Dec 12 '14 at 14:42
  • Thank you for the suggestions and advice. The keys are not totally reliable I'm afraid, some are alpharnumer and others are not. The numerics come first in the sort folowwed by the remainder. I think your suggestion of a dictionary would be the best way forward but, I've not written one before so i gues it's time to take the plunge :) Any hints? – null Dec 12 '14 at 15:15
  • 1
    @SteveGreen [This answer](http://stackoverflow.com/a/3690631/1827903) provides a Delphi implementation of a good hash function. Allocate an array called `set` that's larger than List1.size, generally 25% to 100% larger depending on your memory constraints (the larger the array, the fewer collisions). This array contains linked lists of strings. When adding a string to the array, calculate `int key = hash(string) modulo set.size`, and append the string to the linked list at `set[key]`. Lookups do the same - calculate the key, then traverse the linked list for an exact match – Zim-Zam O'Pootertoot Dec 12 '14 at 15:27
  • 1
    @SteveGreen Using a linked list to handle collisions is a "closed addressing" scheme; an "open addressing" scheme is one where, upon detecting a collision, you would add the string to an adjacent array element (for some definition of "adjacent" - different open addressing schemes use different probing techniques). See the Wikipedia article on [hopscotch hashing](http://en.wikipedia.org/wiki/Hopscotch_hashing) for a specific open addressing scheme. – Zim-Zam O'Pootertoot Dec 12 '14 at 15:31