7

I have an array of Strings:

String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

What is the quickest/most efficient way to order this into a smaller Collection in order of how frequent each String is with its frequency?

I though about using the String as a key in a HashMap<String,Integer> but this wouldnt be sorted in terms of frequency

My other method i considered is using a TreeMap<Integer, String[]> with a list of Strings with that integer, but there seems a lot of checking involved..

Im trying to avoid using more than one loop If possible as my String arrays will be much larger than the one above. Thanks!

EDIT What i want is just to be able to output the Strings in order of frequency and preferably be able to pair that String with its frequency in the array, So for example two output arrays:

["x", "y", "z", "a"]
[3,2,1,1]

This would be quite a simple problem if speed wasnt an issue which is why i ask the great minds on here :)

Eduardo
  • 6,900
  • 17
  • 77
  • 121
  • You can use a `HashMap`. Keep every string as a key and add `1` to the value each time you get a key. Creating your result collection would be nothing more than ordering by value and adding your key value-times (If key `x` has value `5`, print `x` 5 times). – Jeroen Vannevel Sep 06 '13 at 14:55
  • The first answer in this question should give you an idea of how it can be done:http://stackoverflow.com/questions/6712587/counting-frequency-of-characters-in-a-string – Paddyd Sep 06 '13 at 14:57

7 Answers7

10

You can solve this in two steps:

  1. Create a counter object - a Map<String, Integer> listing for each string the number of times it appears in the input: in other words, it's a frequency map. This is O(n), as you only need to traverse the input once for building the map

  2. With the previous map, create a list with its keys, sorted using the frequency of items (the values in the map) as ordering criteria. This is O(n log n), and you can call Collections.sort(), with a Comparator that uses the string frequency for the comparisons

This is what I mean:

String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

final Map<String, Integer> counter = new HashMap<String, Integer>();
for (String str : stringArray)
    counter.put(str, 1 + (counter.containsKey(str) ? counter.get(str) : 0));

List<String> list = new ArrayList<String>(counter.keySet());
Collections.sort(list, new Comparator<String>() {
    @Override
    public int compare(String x, String y) {
        return counter.get(y) - counter.get(x);
    }
});

After the above code executes, the variable list will contain the following values (the order between elements of the same frequency is unspecified):

[x, y, a, z]

It's trivial to convert the list to an array:

list.toArray(new String[list.size()])

And if you need to find out the frequency of each string, just iterate over the sorted keys:

for (String str : list) {
    int frequency = counter.get(str);
    System.out.print(str + ":" + frequency + ", ");
}
Óscar López
  • 232,561
  • 37
  • 312
  • 386
  • What about the ordering of the elements that have the same frequency? Your comparator ensures that elements with the same frequency are alphabetically ordered? – Watchmaker Nov 05 '16 at 19:58
  • Read it again: _"The order between elements of the same frequency is unspecified."_ – Óscar López Nov 05 '16 at 20:06
  • Sorry about that, but in an example like [this one](http://www.java-fries.com/2015/02/sort-elements-frequency/) that ordering of the elements with the same frequency seems to be implicit and it is not completely clear to me why. – Watchmaker Nov 05 '16 at 20:22
3

Use the HashMap<String,Integer> to maintain your counts. This will be the most efficient way to process the arbitrary list of strings.

Create an ArrayList<Map.Entry<String,Integer>> from the map's entrySet().

Sort this list using a Collections.sort() and a custom comparator.

Don't get hung up on micro-optimizations.

parsifal
  • 174
  • 3
2

If third-party libraries are fair game, the following one-liner with Guava is asymptotically optimal:

Multisets.copyHighestCountFirst(ImmutableMultiset.copyOf(array))
   .elementSet().toArray(new String[0]);
Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • Quoting from elementSet() documentation: "The order of the elements in the element set is unspecified". Though the above code works, a safer option would be something like this: Multisets.copyHighestCountFirst(ImmutableMultiset.copyOf(array)).stream().distinct().collect(...) – Jacob Eckel May 09 '16 at 09:41
  • @JacobEckel, `Multisets.copyHighestCountFirst` returns an `ImmutableMultiset`, which does have deterministic ordering. (And for what it's worth, I wrote much of this documentation.) – Louis Wasserman May 09 '16 at 15:26
1
String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

List<String> list = Arrays.asList(stringArray);
Collections.sort(list);

HashMap<String, Integer> map = new HashMap<String, Integer>();

for(int i = 0; i < list.size();) {

    String s = list.get(i); //get the string to count

    int count = list.lastIndexOf(s) - list.indexOf(s) + 1; //count it

    map.put(s, count); // add it

    i = list.lastIndexOf(s) + 1; // skip to the next string

}

I would consider this as an elegant solution but i don't know how performant that is. If you wnat it sorted use a TreeMap, but that is really slow.

You can sort it afterwards like this:

TreeMap<String, Integer> sortedMap = new TreeMap<String, Integer>(unsortedMap);

But note that having Integer as key is not working! Because a key is unique and if for example a and b appear one time, a will be kicked out!

Philipp Sander
  • 10,139
  • 6
  • 45
  • 78
  • I considered using Integer as the key and having an Array/List of Strings as the Value for each String that had that Integer Frequency. You would have to remove it from the list of one and add it to the list of another though and i dont know how efficient that is – Eduardo Sep 06 '13 at 15:33
  • if you know how fast this is, could you tell me? i'm curious – Philipp Sander Sep 06 '13 at 15:36
  • Ill compare it with the answer above and let you know for a long string array – Eduardo Sep 06 '13 at 15:37
  • `Arraylist.indexOf()` searches linearly. So the order of your algorithm becomes _O(nlogn+n^2)_ => _O(n^2)_. This can be improved. – rocketboy Sep 06 '13 at 15:59
  • Sad to say its slower than the one above :( For an array of about 500 Strings it was 2x slower although for less than 100ish Strings it was faster than above – Eduardo Sep 06 '13 at 16:01
1

Print result: 1)string with different occurrence sorted in desc order. 2)string with same occurrence sorted by char in asce order.

 public static void sortStringByOccurance(String[] stringArray) {
    // O(n)
    Map<String, Integer> map = new HashMap<>();
    for (String str : stringArray) {
        map.put(str, map.containsKey(str)? map.get(str)+1 : 1);
    }

    // O(n)
    TreeMap<Integer, TreeSet<String>> treemap = new TreeMap<>();
    for (String key : map.keySet()) {
        if (treemap.containsKey(map.get(key))) {
            treemap.get(map.get(key)).add(key);
        }
        else {
            TreeSet<String> set = new TreeSet<>();
            set.add(key);
            treemap.put(map.get(key), set);
        }
    }

    // O(n)
    Map<Integer, TreeSet<String>> result = treemap.descendingMap();
    for (int count : result.keySet()) {
        TreeSet<String> set = result.get(count);
        for (String word : set) {
            System.out.println(word + ":" + count);
        }
    }
}
Zuwen Gan
  • 11
  • 1
  • Your second loop is O(n log n), not O(n), since each TreeMap operation (implemented as an RB tree) is only guaranteed to run in O(log n). – MRA Mar 21 '16 at 16:41
0

was possible with least lines of code:

String[] s = {"x", "y", "z", "x", "x", "y", "a"};
HashMap<String,Integer> hm = new HashMap<String,Integer>();
for(int i=0;i<s.length;i++){
    int count = hm.containsKey(s[i]) ? hm.get(s[i]) : 0;
    hm.put(s[i], count + 1);            
}
Prashant
  • 13
  • 5
0

Another solution :

String[] s = {"x", "y", "z", "x", "x", "y", "a"};
HashMap<String,Integer> hm = new HashMap<String,Integer>();

for(int i=0;i<s.length;i++){
    hm.putIfAbsent(s[i], 0);
    hm.put(s[i], hm.get(s[i]) + 1);
}
System.out.println(hm);
Prashant
  • 13
  • 5