-2

I'm working on a little project in java, and I want to make my algorithm more efficient.

What I'm trying to do is check if a given string is present in an array of strings.

The thing is, I know a few ways to check if a string is present in an array of strings, but the array I am working with is pretty big (around 90,000 strings) and I am looking for a way to make the search more efficient, and the only ways I know are linear search based, which is not good for an array of this magnitude.

Edit: So I tried implementing the advices that were given to me, but the code i wrote accordingly is not working properly, would love to hear your thoughts.`

public static int binaryStringSearch(String[] strArr, String str) {
        
        int low = 0;
        int high = strArr.length -1;
        int result = -1;
        
        while (low <= high) { 
            int mid = (low + high) / 2;
            if (strArr[mid].equals(str)) {
                result = mid;
                return result;
            }else if (strArr[mid].compareTo(str) < 0) {
                low = mid + 1;
            }else {
                high = mid - 1;
            }   
        }   
        return result;
    }

Basically what it's supposed to do is return the index at which the string is present in the array, and if it is not in the array then return -1.

RONOLULU
  • 727
  • 1
  • 4
  • 12
  • Is the array sorted? If not you won't get anything better than a linear runtime. – Turamarth Dec 07 '20 at 12:47
  • Also, why are you worrying about the performance beforehand? Did you try to run it with the linear approach? Did you notice performance issues? – maloomeister Dec 07 '20 at 12:51
  • @Turamarth the array is sorted alphabetically. – RONOLULU Dec 07 '20 at 14:27
  • @maloomeister because for example if the given string is at the very end of the array then it will take over 80,000 iterations to get there. I'm pretty new to programming so I'm trying to learn more efficient ways to approach these kind of problems. – RONOLULU Dec 07 '20 at 14:27
  • @jschnasse because I'm new to java, currently I am only using java's basic commands to get things done, so to not get confused with all the different methods and approaches that exist outside of the basic java library – RONOLULU Dec 07 '20 at 14:29
  • @RONOLULU The post mentioned above shows the most basic options: 1. Search linear. 2. Sort first and do a quicksearch afterwards 3. Push your strings into a map or tree-like datastructure. All variants can be efficient. It all depends on your efficiency criteria. – jschnasse Dec 08 '20 at 07:48

3 Answers3

1

So you have a more or less fixed array of strings and then you throw a string at the code and it should tell you if the string you gave it is in the array, do I get that right? So if your array pretty much never changes, it should be possible to just sort them by alphabet and then use binary search. Tom Scott did a good video on that (if you don't want to read a long, messy text written by someone who isn't a native english speaker, just watch this, that's all you need). You just look right in the middle and then check - is the string you have before or after the string in the middle you just read? If it is already precisely the right one, you can just stop. But in case it isn't, you can eliminate every string after that string in case it's after the string you want to find, otherwise every string that's before the just checked string. Of course, you also eliminate the string itself if it's not equal because - logic. And then you just do it all over again, check the string in the middle of the ones which are left (btw you don't have to actually delete the array items, it's enough just to set a variable for the lower and upper boundary because you don't randomly delete elements in the middle) and eliminate based on the result. And you do that until you don't have a single string in the list left. Then you can be sure that your input isn't in the array. So this basically means that by checking and comparing one string, you can't just eliminate 1 item like you could with checking one after the other, you can remove more then half of the array, so with a list of 256, it should only take 8 compares (or 9, not quite sure but I think it takes one more if you don't want to find the item but know if it exists) and for 65k (which almost matches your number) it takes 16. That's a lot more optimised.

If it's not already sorted and you can't because that would take way too long or for some reason I don't get, then I don't quite know and I think there would be no way to make it faster if it's not ordered, then you have to check them one by one.

Hope that helped!

Edit: If you don't want to really sort all the items and just want to make it a bit (26 times (if language would be random)) faster, just make 26 arrays for all letters (in case you only use normal letters, otherwise make more and the speed boost will increase too) and then loop through all strings and put them into the right array matching their first letter. That way it is much faster then sorting them normally, but it's a trade-off, since it's not so neat then binary search. You pretty much still use linear search (= looping through all of them and checking if they match) but you already kinda ordered the items. You can imagine that like two ways you can sort a buncha cards on a table if you want to find them quicker, the lazy one and the not so lazy one. One way would be to sort all the cards by number, let's just say the cards are from 1-100, but not continuously, there are missing cards. But nicely sorting them so you can find any card really quickly takes some time, so what you can do instead is making 10 rows of cards. In each one you just put your cards in some random order, so when someone wants card 38, you just go to the third row and then linearly search through all of them, that way it is much faster to find items then just having them randomly on your table because you only have to search through a tenth of the cards, but you can't take shortcuts once you're in that row of cards.

1

Depending on the requirements, there can be so many ways to deal with it. It's better to use a collection class for the rich API available OOTB.

  1. Are the strings supposed to be unique i.e. the duplicate strings need to be discarded automatically and the insertion order does not matter: Use Set<String> set = new HashSet<>() and then you can use Set#contains to check the presence of a particular string.

  2. Are the strings supposed to be unique i.e. the duplicate strings need to be discarded automatically and also the insertion order needs to be preserved: Use Set<String> set = new LinkedHashSet<>() and then you can use Set#contains to check the presence of a particular string.

  3. Can the list contain duplicate strings. If yes, you can use a List<String> list = new ArrayList<>() to benefit from its rich API as well as get rid of the limitation of fixed size (Note: the maximum number of elements can be Integer.MAX_VALUE) beforehand. However, a List is navigated always in a sequential way. Despite this limitation (or feature), the can gain some efficiency by sorting the list (again, it's subject to your requirement). Check Why is processing a sorted array faster than processing an unsorted array? to learn more about it.

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
0

You could use a HashMap which stores all the strings if

  • Contains query is very frequent and lookup strings do not change frequently.
  • Memory is not a problem (:D) .
Ashish Gupta
  • 112
  • 6