Arrays.sort() vs sorting using map

Question

I have a requirement where I have to loop through an array which has list of strings:

String[] arr = {"abc","cda","cka","snd"}

and match the string "bca", ignoring the order of the characters, which will return true as it’s present in the array ("abc").

To solve this I have two approaches:

Use Arrays.sort() to sort both the strings and then use Arrays.equals to compare them.
create 2 hashmaps and add frequency of each letter in string and then finally compare two map of char using equals method.

I read that complexity of using Arrays.sort() method is more. So, thought of working on 2nd approach but when I am running both the code 1st approach is taking very less time to execute program.

Any suggestions why this is happening?

"very less time" how much (in absolute value) and with how big an array? — Federico klez Culloca, Aug 21 '20 at 07:34
Array has 5 elements and I am running this program 50000 times to check performance , for 1st approach it is 500 and for 2nd it is 1185 — Bhumi, Aug 21 '20 at 07:36
If you use the example you posted here (4 strings of length 3 each), time complexity is irrelevant. Time complexity actually starts being a factor when the input is much bigger. — amit, Aug 21 '20 at 07:37
input can vary it could be 100 chars also then will it change ? — Bhumi, Aug 21 '20 at 07:42
in 2nd approach there are two loops this could be the reason ? — Bhumi, Aug 21 '20 at 07:45
Sorry - why don't you just linearly search through the array for the target string? Am I missing something? — KidCoder, Aug 21 '20 at 07:49
I can't use equals here cos order of string is not same. fox example in array it is "cab" and I am searching for "abc". then I will have to run two nested loops 1 is to iterate through array of strings and other to iterate characters in that string — Bhumi, Aug 21 '20 at 10:31
The time complexity only tells you, how the approach will scale with (significantly) larger input. It doesn’t tell you which approach is faster. For you small strings and arrays, it is not surprising that the hash approach is slower. You’ll need really large inputs to see the effects of the different time complexity. — Holger, Aug 21 '20 at 10:36
So rephrase your question. You said: "match the string "bca" which will return true as its present in array" which implies the String equality, not of all its characters separately. — Nikolas Charalambidis, Aug 21 '20 at 11:06
@Nikolas your Answer was very helpful, why did u delete it ? plus yes it is char comparison for that I will have to use 2 loops correct ? — Bhumi, Aug 21 '20 at 11:39
Please post your attempt with the maps of char, so we can check if you've done it correctly — fps, Aug 21 '20 at 14:32

score 3 · Answer 1 · answered Aug 21 '20 at 14:49

The Time Complexity only tells you, how the approach will scale with (significantly) larger input. It doesn’t tell you which approach is faster.

It’s perfectly possible that a solution is faster for small input sizes (string lengths and/or array length) but scales badly for larger sizes, due to its Time Complexity. But it’s even possible that you never encounter the point where an algorithm with a better Time Complexity becomes faster, when natural limits to the input sizes prevent it.

You didn’t show the code of your approaches, but it’s likely that your first approach calls a method like toCharArray() on the strings, followed by Arrays.sort(char[]). This implies that sort operates on primitive data.

In contrast, when your second approach uses a HashMap<Character,Integer> to record frequencies, it will be subject to boxing overhead, for the characters and the counts, and also use a significantly larger data structure that needs to be processed.

So it’s not surprising that the hash approach is slower for small strings and arrays, as it has a significantly larger fixed overhead and also a size dependent (O(n)) overhead.

So first approach had to suffer from the O(n log n) time complexity significantly to turn this result. But this won’t happen. That time complexity is a worst case of sorting in general. As explained in this answer, the algorithms specified in the documentation of Arrays.sort should not be taken for granted. When you call Arrays.sort(char[]) and the array size crosses a certain threshold, the implementation will turn to Counting Sort with an O(n) time complexity (but use more memory temporarily).

So even with large strings, you won’t suffer from a worse time complexity. In fact, the Counting Sort shares similarities with the frequency map, but usually is more efficient, as it avoids the boxing overhead, using an int[] array instead of a HashMap<Character,Integer>.

score 1 · Answer 2 · answered Aug 21 '20 at 07:55

1

Approach 1: will be O(NlogN)

Approach 2: will be O(N*M), where M is the length of each string in your array.

You should search linearly in O(N):

for (String str : arr) {
    if (str.equals(target)) return true;
}
return false;

answered Aug 21 '20 at 07:55

KidCoder

363
5
18

`if (str.equals(target)) ` : Will `equals` say "abc" and "bca" are equal? – adarsh Aug 23 '20 at 03:58

Peter Walser · Answer 3 · 2020-08-21T15:09:25.107

Let's decompose the problem:

You need a function to sort a string by its chars (bccabc -> abbccc) to be able to compare a given string with the existing ones.

Function<String, String> sortChars = s -> s.chars()
        .sorted()
        .mapToObj(i -> (char) i)
        .map(String::valueOf)
        .collect(Collectors.joining());

Instead of sorting the chars of the given strings anytime you compare them, you can precompute the set of unique tokens (values from your array, sorted chars):

Set<String> tokens = Arrays.stream(arr)
        .map(sortChars)
        .collect(Collectors.toSet());

This will result in the values "abc","acd","ack","dns".

Afterwards you can create a function which checks if a given string, when sorted by chars, matches any of the given tokens:

Predicate<String> match = s -> tokens.contains(sortChars.apply(s));

Now you can easily check any given string as follows:

boolean matches = match.test("bca");

Matching will only need to sort the given input and do a hash set lookup to check if it matches, so it's very efficient.

You can of course write the Function and Predicate as methods instead (String sortChars(String s) and boolean matches(String s) if you're unfamiliar with functional programming.

score 0 · Answer 4 · answered Aug 21 '20 at 14:57

More of an addendum to the other answers. Of course, your two options have different performance characteristics. But: understand that performance is not necessarily the only factor to make a decision!

Meaning: if you are talking about a search that runs hundreds or thousands of time per minute, on large data sets: then for sure, you should invest a lot of time to come up with a solution that gives you best performance. Most likely, that includes doing various experiments with actual measurements when processing real data. Time complexity is a theoretical construct, in the real world, there are also elements such as CPU cache sizes, threading issues, IO bottlenecks, and whatnot that can have significant impact on real numbers.

But: when your code will doing its work just once a minute, even on a few dozen or hundred MB of data ... then it might not be worth to focus on performance.

In other words: the "sort" solution sounds straight forward. It is easy to understand, easy to implement, and hard to get wrong (with some decent test cases). If that solution gets the job done "good enough", then consider to use use that: the simple solution.

Performance is a luxury problem. You only address it if there is a reason to.

Arrays.sort() vs sorting using map

4 Answers4