Find frequency of a character in an array of Strings

Question

Given an array of Strings, find the frequency of occurrence of a particular character.

eg. Given array {"hon","bhig","zzz","hello"} and character 'h', the output is 3.

Here's how I solved it: Approach 1: Iterate through every string in the array, increment a counter every time that character occurs in the current String. Run time is O(n), where n is the cumulative length of all strings in the array.

Approach 2: This can be optimized using a HashMap; this is particularly helpful if the strings are repeated in the array. Here's what I did: take a HashMap where key = string and value = number of times that string occurs in the array. Put all strings in the given array into the HashMap along with their counts. Then iterate over each key-value pair in the HashMap, count the number of times the given character appears in the key(string) and increment it by its corresponding value in the HashMap.

My question is: Is there a better way to do this?

Here's the code:

NOTE: PLEASE READ THE ENTIRE ACCEPTED ANSWER.

public static int findFreq(String[] arr,char c) {
    Map<String,Integer> map  = new HashMap<String,Integer>();
    for(int i=0;i<arr.length;i++) {
        if(map.containsKey(arr[i])) 
            map.put(arr[i],map.get(arr[i])+1);
        else
            map.put(arr[i], 1);
    }
    int freq=0;
    for(Entry<String,Integer> entr:map.entrySet()) {
        String s = entr.getKey();
        for(int i=0;i<s.length();i++) {
            if(s.charAt(i)==c)
                freq += entr.getValue();
        }
    }
    return freq;
}

Seeing as you're going to have to look at every individual character in the array to solve this, you'll never do better than O(n). I don't see how doing a map by strings is all that helpful (in fact you don't need the map if you're never going to look at `arr` again). If you want to keep it, I'd map from each letter in the alphabet to the number of times it occurs (i.e., `h --> 3`). — musical_coder, Oct 16 '13 at 21:35
Computing the hashcode for a string involves looking at every letter. Granted that the hashcode may have already been computed once (and hence cached), the second approach is potentially considerably more work and (on the average) cannot be less work. There are savings only if string counts are significantly more than 1 each. — Ted Hopp, Oct 16 '13 at 21:41

score 3 · Answer 1 · answered Oct 16 '13 at 21:37

3

Sorry, I think Approach 2 slows things down. In order to add each string to the HashMap, the method computes the hash code, which looks at every character in the string. So setting up the HashMap already looks at every character in every string, which takes as long as what you'd have to do with approach 1, plus then you have to make another pass through the map.

answered Oct 16 '13 at 21:37

ajb

31,309
3
58
84

Plus the map only provides savings if strings are repeated in the array. The sample array OP posted would have no savings at all. – Ted Hopp Oct 16 '13 at 21:42

score 2 · Answer 2 · answered Oct 16 '13 at 21:37

Approach 1 is preferable here. The cost is O(N) to either of them in the worst case. The second approach using HashMap<String> for remembering old visited string (with inherent hashing cost) would not bring improvement to performance worthy to be mentioned. We should avoid premature optimization, as approach 1 is simpler.

Java Devil · Accepted Answer · 2013-10-16T23:19:42.490

Approach 2 is not very optimised, what you should really do is create a Map<Character,Integer> then you don't the second loop to count but you need to then loop each character in each String.

Approach 1, depending on your implementation also only counts for each character occurring in the String, does it consider if the character occurs twice, eg "hash"?

Either approach needs to compare EACH character in EACH String and then count

This is how approach 2 should be

public static int findFreq(String[] arr,char c) {
    Map<Character,Integer> map  = new HashMap<Character,Integer>();
    for(int i=0;i<arr.length;i++) {
        for(Character ch : arr[i].toCharArray()){
            if(map.containsKey(ch)) 
                map.put(ch,map.get(ch)+1);
            else
                map.put(ch, 1);
        }
    }
    return map.get(Character.valueOf(c));
 }

Either way both approaches will be O(n), from the docs for HashMap

This implementation provides constant-time performance for the basic operations (get and put)

But that said even with the approach I provided above this requires additional get when populating the map.

So Approach 1 is better if using for a single search, if using repeatedly then approach 2 is the way to go (but populate the map outside the method)

Some metrics for you:

Number of Words  |    Array (approach 1)   |   Map (My approach 2)  |  Map (your approach 2)
                 |       (time in ms)      |     (time in ms)       |      (time in ms) 
                 |     (groovy)/(java)     |     (groovy)/(java)    |     (groovy)/(java)     
-------------------------------------------------------------------------------------------
      43303      |         118 /  5        |         229 / 34       |             / 16     
     417221      |         852 / 10        |        1088 / 120      |             / 49
    2086705      |        2929 / 45        |        5064 / 731      |             / 219

I retract my method, it appears your Map approach is faster!

This was my array method (in case yours differs)

private static int findFreqArray(String[] arr, char c){
    int count = 0;
    for(int i=0;i<arr.length;i++) {
        for(char ch : arr[i].toCharArray()){
            if(ch == c)
                count++;
        }
    }
    return count;  
}

Thank you so much for the metrics, that really cleared up some doubts I had. And yes, as many suggested, approach 1 seems to be the fastest. — codewarrior, Oct 16 '13 at 23:07
Yeah appears that why, I was surprised that your map approach was quicker with 2 loops but now that I look at it, the call to `arr[i].toCharArray()` in my method would probably be what slows it down — Java Devil, Oct 16 '13 at 23:11
Well probably more like its mapping every character whereas the array method (and your map) only maps if the character matches. — Java Devil, Oct 16 '13 at 23:15

score 1 · Answer 4 · edited May 23 '17 at 12:21

1

Not necessarily. Yet another possibility would be to "flatten" your array into a single string and search for a single character in it (fast the same as your variant 1). This would maybe speed thinks a little bit, but it would not necessarily make the code "better". Example for a char search in a string can be found in this SO answer.

edited May 23 '17 at 12:21

Community

1
1

answered Oct 16 '13 at 21:36

keenthinker

7,645
2
35
45

score 1 · Answer 5 · answered Oct 16 '13 at 21:43

No, you'll never do better than O(n) for just one search. But if you're going to be searching many times against the same array, for different characters, you could start by running through the array and building a hash map from each character to its number of occurrences. Then, for each search, you just have to do a simple constant-time look-up, not an O(n) search.

score 1 · Answer 6 · answered Oct 16 '13 at 21:45

1

Hashmap is even more slow than the first one. Both algorithms needs to pass from each character once, so both needs O(n) time. But the first one is much simpler, and fewer lines of code would be executed.

Nice try though :)

answered Oct 16 '13 at 21:45

Jim Blum

2,656
5
28
37

Find frequency of a character in an array of Strings

6 Answers6