Character counter in Java

Question

So far this is what I tried:

public class CharacterCounter {

public static void main(String[] args){

    String string = "sashimi";

    int count = 0;
    for(int i =0; i < string.length(); i++){
        if(string.charAt(i) == 'i'){
            count++;
            }
    }

    System.out.println("The number of letter i is " + count);

} 
}

Output:

 The number of letter i is 2

But what I wanna do is, the program should count the most occurred characters.

For example here the string is SASHIMI, the output should be:

 the number of letter S is 2
 the number of letter I is 2

I'm stuck with this problem. I need your help. Thanks.

You should use a `HashMap `. The key is a character. The value is the number of occurrence. — Arnaud Denoyelle, Aug 01 '13 at 13:12
Perhaps you should use some sort of [map](http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html) where the character is the key and its frequency is the value. Then, you can iterate through the map and find the keys with the highest values. — thegrinner, Aug 01 '13 at 13:13
An `int[1<<16]` would do great as well, and far more performant. Blazing fast, in fact. — Marko Topolnik, Aug 01 '13 at 13:13
See http://stackoverflow.com/questions/6100712/simple-way-to-count-character-occurences-in-a-string — Mihai8, Aug 01 '13 at 13:16

score 7 · Answer 1 · edited Jun 20 '20 at 09:12

7

This would be the fastest way:

final int[] counts = new int[1<<16];

for (char c : <your_string>)
  counts[c]++;

(i've just sketched out the part which iterates over all your chars, I believe that's the easy part, and not directly related to this question).

Benchmark results

I've pitted the HashMap approach against mine with three string lengths:

10
1,000
100,000

And these are the results:

Benchmark       Mode Thr    Cnt  Sec         Mean   Mean error    Units
testArray1      thrpt   1      5    5        6.870        0.083 ops/msec
testArray2      thrpt   1      5    5        6.720        0.374 ops/msec
testArray3      thrpt   1      5    5        3.770        0.019 ops/msec
testHashMap1    thrpt   1      5    5     1269.123      251.766 ops/msec
testHashMap2    thrpt   1      5    5       12.776        0.165 ops/msec
testHashMap3    thrpt   1      5    5        0.141        0.005 ops/msec

What do they mean? Yes, initializing a full 512K block of memory to zero is costly. But after that is paid, my array algorithm hardly even notices the thousands of characters whizzing by. The HashMap approach, on the other hand, is much faster for very short strings, but scales dramatically worse. I guess the crossover is at about 2k string length.

I suppose it is not disputed that such character-count statistics are usually run against massive text corpora, and not stuff like your name and surname.

Of course, the performance of the array approach can be improved substantially if you can assume that not the complete UTF-16 codepoint range will be used. For example, if you use an array that accomodates only the lowest 1024 codepoints, the performance rises to 470 ops/msec.

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 01 '13 at 13:15

Marko Topolnik

195,646
29
319
436

It's faster than map only for long strings - for short ones initialization of the array can take much more time than dealing with map – gawi Aug 01 '13 at 13:17
Ok this is probably the fastest but i would go with sort option. – Damian Leszczyński - Vash Aug 01 '13 at 13:19
@gawi Did you benchmark it? A single allocation of 64k ints takes almost nothing. Opposed to that a `HashMap` is a complex object which needs initialization. Not to mention all the work needed to maintain that HashMap. I'm accepting your challenge and testing this out. – Marko Topolnik Aug 01 '13 at 13:19
I think you could add is there a reason to go with the fastest way - do you believe that other methods would cause bottlenecks? How much faster would it really be? Why? – eis Aug 01 '13 at 13:19
@eis It's not just fast, but also the most trivially simple and straightforward way. If the speed was acquired by some monstrously complex hacks, that would be another thing. – Marko Topolnik Aug 01 '13 at 13:21
@MarkoTopolnik I didn't check it, I know that Java has sometimes big overhead, but in principle for short strings map should be faster (you initialize 256kB of memory) – gawi Aug 01 '13 at 13:23
@MarkoTopolnik Hi, I'm greenhorn in Java so I would like to know what's int[1<<16]; for? – Dunkey Aug 01 '13 at 13:24
+1 Speedy and simple at the same time. Memory is no longer a constraint these days. – Ravi K Thapliyal Aug 01 '13 at 13:25
@Dunkey declaration of array of 65536 ints – gawi Aug 01 '13 at 13:25
@Eis, This is just simplest solution do it involving a HashMap like gawi proposed is like shooting from tank to kill a ant. – Damian Leszczyński - Vash Aug 01 '13 at 13:29
@Dunkey, That is to allocate the enough space for char type. char is 16-bit long. `<<` it is shift operator. In simply word it say move 1 sixteen time to left. This will produce 1_0000_0000_0000_0000 that is equal to 65536. – Damian Leszczyński - Vash Aug 01 '13 at 13:39
@gawi I've added the benchmark results. You were right, I forgot to factor in the initialization of the whole array to zero. With that on mind, it's clear that it will not perform well for short strings. However, you can also see that it scales dramatically better with longer strings. – Marko Topolnik Aug 01 '13 at 14:10

score 4 · Accepted Answer · answered Aug 01 '13 at 13:18

    char[] chars = string.toCharArray();
    HashMap<Character, Integer> countMap = new HashMap<Character, Integer>();
    for (char aChar : chars) {
        if (countMap.containsKey(aChar)) {
            countMap.put(aChar, countMap.get(aChar) + 1);
        } else {
            countMap.put(aChar,1);
        }
    }

    //determine max occurence
    int max = 0;
    for (Integer i: countMap.values()) {
        if (max < i) {
            max = i;
        }
    }

    //print all satisfiying max occurrence
    for (Map.Entry<Character, Integer> e: countMap.entrySet()) {
        if (e.getValue() == max) {
            System.out.println("The number of letter " + e.getKey() + "  is " + max);
        }
    }

Adam Stelmaszczyk · Answer 3 · 2013-08-01T13:40:45.837

I believe that using primitives would be faster than using HashMap. This works:

public static void main(String[] args)
{
    final String string = "sashimi";
    final int counters[] = new int[256]; // assuming you would use only ASCII chars
    for (final char c : string.toCharArray())
    {
        counters[c]++;
    }
    int maxCounter = 0;
    for (final int counter : counters)
    {
        if (maxCounter < counter)
        {
            maxCounter = counter;
        }
    }
    for (int i = 0; i < counters.length; i++)
    {
        if (counters[i] == maxCounter)
        {
            System.out.printf("%c has %d occurences.\n", i, counters[i]);
        }
    }
}

Output:

i has 2 occurences.
s has 2 occurences.

score 1 · Answer 4 · answered Aug 01 '13 at 13:15

As mentioned in the comments, a HashMap seems ideal for this, although I won't give you the direct code, I'll give you a pseduo-code template.

for(each letter in a word)
{
    if(this letter (l) exists in your hash map)
    {
         hashmap.put(l, hashmap.get(l) ++);
    }
    else
    {
         hashmap.put(l, 1);
    }
}

This will give you a hashmap of all letters, mapped to the amount of times they appear in a word. Following your example:

S => 2
A => 1
H => 1
I => 2
M => 1

score 1 · Answer 5 · answered Aug 01 '13 at 13:23

I suggest you create a TreeSet and then you can have a new class that will store the character and the number of ocurrences, then you can have that class have a compareTo that checks the occurrence and an equals that checks the char. Then whenever you insert them in the treeset they will always be in the order of whichever one appeared the most.

Please let me know if you need help with this or if you can figure it out with this information :)

EDIT: once you have filled the TreeSet with all of the letters, all you have to do is start getting them out 1 by 1 until the occcurence of the one that you took out is less than the one you took before (ie, if the top 3 letters appeared 3 times and the forth one 2, you only display the first 3).

score 0 · Answer 6 · answered Aug 01 '13 at 13:12

0

you must take an HashMap to keep the most repeated chars with the repeat time and print it.

answered Aug 01 '13 at 13:12

Paniz

594
6
19

Finally ? HashMap or ArrayList ? xD – Konstantin Yovkov Aug 01 '13 at 13:13
two `arrayLists` or one `hashmap` depending on what you wanna do, `HashMap` has overhead but seeks faster. – Paniz Aug 01 '13 at 13:17

Damian Leszczyński - Vash · Answer 7 · 2013-08-01T13:25:53.593

What you need to do is to take the literal (string). And look over each char of it and put it to proper bucket. In other words you need to group them.

You could create a bucket for each letter of alphabet. Then you could place the char in proper bucket and at the end count the items in it to have the answer.

See Marko answer, that do this.

Another option is that you sort your literal AHIIMSS, then using simple loop you will be able to write the results.

The method you pick depends the result you need to get. If you need to find how many of each letter ware using in word then sort options is more tide, if you need to pick only the maximum letters then solution with buckets is more useful.

score 0 · Answer 8 · answered Aug 01 '13 at 13:33

import java.util.*;

public class CharacterCounter {

public static void main(String[] args){

String string = "sashimi";
int count = 0;
ArrayList<Character> c = new ArrayList<Character>();
for(int i =0; i <string.length(); i++)
{
    count=0;
    if(c.contains(string.charAt(i)))
    {
        continue;
    }   
    c.add(string.charAt(i));        
    for(int j = 0;j<string.length();j++)
    {

        if(string.charAt(j) == string.charAt(i))
        {

            count++;

        }


    }
    System.out.println("The number of letter "+string.charAt(i)+" is " + count);
}

} }

score 0 · Answer 9 · answered Aug 01 '13 at 15:17

    String str = "sashimi";
    Map<Character,Integer> countMap=new HashMap<Character,Integer>();
    Set<Character> maxcSet=new HashSet<Character>();
    Character maxC=null;
    Integer maxCount=null;
    for (int i = 0; i < str.length(); i++) {
        char c=str.charAt(i);
        Integer tempCount=countMap.get(c);

        if(tempCount==null){
            tempCount=0;
        }

        ++tempCount;

        if(i==0){
            maxCount=tempCount;
            maxC=c;
        }else if(tempCount!=null){
            if(maxCount<tempCount){
                maxC=c;
                maxCount=tempCount;
                maxcSet.clear();
                maxcSet.add(maxC);
            }else if(maxCount==tempCount){
                maxcSet.add(c);
            }
        }
        countMap.put(c, tempCount);
    }

    System.out.println("The number of letter i is " + maxcSet);

Manash Ranjan Dakua · Answer 10 · 2015-09-27T17:03:47.320

import java.util.Scanner;


public class CountingCharecter {
public static void main(String[] args) throws Exception {
    ///Reading Data String from keyboard
    int count=0;
    System.out.println("Enter Your String:");
    Scanner sc = new Scanner(System.in);
    String s1 = sc.nextLine();
    //// Reading `Character` Data from Keyboard
    System.out.println("Enter an character:");
    //Here we read the character from console type cast the character because the read() return type is int
    char ch =(char)System.in.read();
    for(int i=0;i<s1.length();i++){
           char c = s1.charAt(i);
           if(c==ch){
               count++;
           }//if


    }//for
    System.out.println("The Number of character which you want to search is having: "+count+" Times");
}
}//CharecterCount
/*

input:- Enter Your String: Manash Enter an character: a output:- 2

*/

Rollyng · Answer 11 · 2013-08-01T13:23:13.300

-1

 public static int numberOfOccurence(String yourString, char needle) {
      int nb = 0;
      for (int i=0; i < yourString.length(); i++)
    {
        if (yourString.charAt(i) == needle)
                   nb++;

    }
    return nb;
}

You can also use Pattern and Matcher :

   Pattern pattern = Pattern.compile("i");
   Matcher  matcher = pattern.matcher("saigigd");

   int count = 0;
   while (matcher.find())
   count++;
   System.out.println(count);

edited Aug 01 '13 at 13:23

answered Aug 01 '13 at 13:13

Rollyng

1,387
2
12
18

Have you thought about cost of that algorithm? – agad Aug 01 '13 at 13:27
You need to that for each letter from the alphabet. – agad Aug 01 '13 at 13:34
I just give two solutions, in the first one, there is no need to search in the whole alphabet. Usually we downvote bad solutions, here is not the case, you might suggest above solutions to be aware of memory issues. But thanks for your concerns – Rollyng Aug 01 '13 at 13:41
In both solutions you need to iterate over every unique letter in input at least. – agad Aug 01 '13 at 13:46

Character counter in Java

11 Answers11

Benchmark results