Optimize insertion from ArrayList to HashMap

Question

I'm trying to insert data from ArrayList to HashMap<String, Language> optimally.

Many items may have the same languge_name (code below), so I need to group items having the same language in Language class and store languages in a HashMap with the name of the language as a Key.

Item

String name;
String language_name;

Language

String language_name;
int numberItems; 
LinkedList<String> Items;

I solved this as follows:

        ArrayList<Item> items; // given array of items
        HashMap<String, Language> languages = new HashMap<String, Language>();

        items.forEach(item -> {
            /** case 1: language isn't specified */
            if (item.getLanguageName() == null) {
                item.setLanguageName("unknown");
            }
            /** case 2: language already added */
            if (languages.containsKey(item.getLanguageName())) {
                languages.get(item.getLanguageName()).getItems().add(item.getName());
                languages.get(item.getLanguageName())
                        .setNumberItems(languages.get(item.getLanguageName()).getNumberItems() + 1);
            } else {
                /** case 3: language isn't added yet */
                LinkedList<String> languageItems = new LinkedList<String>();
                languageItems.add(item.getName());
                Language language = new Language(item.getLanguageName(), 1, languageItems);
                languages.put(item.getLanguageName(), language);
            }
        });

Any help would be appreciated!

Why do you need a manual counter `int numberItems;` in your Language class? You can allways just call `.size()` on your list if you need to know how many items it contains. Adding an additional counter that you manually increment seems kind of redundant. — OH GOD SPIDERS, Aug 05 '20 at 14:36
Learn about the power of local variables. They allow you to perform an operation like `get(item.getLanguageName())` only once, instead of three times in a row. You can even omit the fourth hash lookup made in `if(languages.containsKey(item.getLanguageName()))`, by doing `Language lang = get(item.getLanguageName());` first, followed by `if(lang != null) { lang.getItems().add(item.getName()); lang.setNumberItems(lang.getNumberItems() + 1); } else …` though, as the previous commenter said, updating this counter is obsolete. And [don’t use `LinkedList`](https://stackoverflow.com/q/322715/2711488)… — Holger, Aug 06 '20 at 10:28

score 1 · Answer 1 · answered Aug 05 '20 at 14:49

1

Assuming you're using Java 8 or later, this can be accomplished nicely with built-in stream functions.

HashMap<String, List<Items>> itemsGroupedByLanguage =
 items.stream().collect(Collectors.groupingBy(Items::getLanguage));

answered Aug 05 '20 at 14:49

Anonymous Beaver

370
2
10

2

But the result is supposed to be `HashMap`, not `HashMap>`. – Holger Aug 06 '20 at 10:30
Yes, the result is HashMap, there is redundant in HashMap>. Stream collecting is readable but does not respond to my need – dev1 Aug 06 '20 at 13:54
Just wondering, but what is the purpose of putting it in that particular format? For example, since all the information is contained in the language object, is there a reason that it's a hashmap and not List – Anonymous Beaver Aug 08 '20 at 22:58

Pankaj · Answer 2 · 2020-08-13T07:17:42.503

tl;dr

It's not possible to achieve what you desire using Java (8+) inbuilt collector, but you can write your own custom collector and write code like below to collect into a map as -

Map<String, Language> languages = items.stream().collect(LanguageCollector.toLanguage());

Let's first look at Collector<T, A, R> interface

public interface Collector<T, A, R> {
    /**
     * A function that creates and returns a new mutable result container.
     */
    Supplier<A> supplier();

    /**
     * A function that folds a value into a mutable result container.
     */
    BiConsumer<A, T> accumulator();

    /**
     * A function that accepts two partial results and merges them.  The
     * combiner function may fold state from one argument into the other and
     * return that, or may return a new result container.
     */
    BinaryOperator<A> combiner();

    /**
     * Perform the final transformation from the intermediate accumulation type
     */
    Function<A, R> finisher();

    /**
     * Returns a  Set of Collector.Characteristics indicating
     * the characteristics of this Collector.  This set should be immutable.
     */
    Set<Characteristics> characteristics();
}

Where T is the generic type of the items in the stream to be collected. A is the type of the accumulator, the object on which the partial result will be accumulated during the collection process. R is the type of the object (typically, but not always, the collection) resulting from the collect operation

Now let's look at the custom LanguageCollector

  public class LanguageCollector
      implements Collector<Item, Map<String, Language>, Map<String, Language>> {

    /**
     * The supplier method has to return a Supplier of an empty accumulator - a parameterless
     * function that when invoked creates an instance of an empty accumulator used during the
     * collection process.
     */
    @Override
    public Supplier<Map<String, Language>> supplier() {

      return HashMap::new;
    }

    /**
     * The accumulator method returns the function that performs the reduction operation. When
     * traversing the nth element in the stream, this function is applied with two arguments, the
     * accumulator being the result of the reduction (after having collected the first n–1 items of
     * the stream) and the nth element itself. The function returns void because the accumulator is
     * modified in place, meaning that its internal state is changed by the function application to
     * reflect the effect of the traversed element
     */
    @Override
    public BiConsumer<Map<String, Language>, Item> accumulator() {

      return (map, item) -> {
        if (item.getLanguageName() == null) {
          item.setLanguageName("unknown");
        } else if (map.containsKey(item.getLanguageName())) {
          map.get(item.getLanguageName()).getItems().add(item.getName());
          map.get(item.getLanguageName())
              .setNumberItems(map.get(item.getLanguageName()).getNumberItems() + 1);
        } else {
          Language language = new Language(item.getLanguageName(), 1);
          language.add(item.getName());
          map.put(item.getLanguageName(), language);
        }
      };
    }

    /**
     * The combiner method, return a function used by the reduction operation, defines how the
     * accumulators resulting from the reduction of different subparts of the stream are combined
     * when the subparts are processed in parallel
     */
    @Override
    public BinaryOperator<Map<String, Language>> combiner() {
      return (map1, map2) -> {
          map1.putAll(map2);
          return map1;
       };
    }

    /**
     * The finisher() method needs to return a function which transforms the accumulator to the
     * final result. In this case, the accumulator is the final result as well. Therefore it is
     * possible to return the identity function
     */
    @Override
    public Function<Map<String, Language>, Map<String, Language>> finisher() {
      return Function.identity();
    }

    /**
     * The characteristics, returns an immutable set of Characteristics, defining the behavior of
     * the collector—in particular providing hints about whether the stream can be reduced in
     * parallel and which optimizations are valid when doing so
     */
    @Override
    public Set<Characteristics> characteristics() {
      return Collections.unmodifiableSet(
          EnumSet.of(Characteristics.IDENTITY_FINISH));
    }

    /**
     * Static method to create LanguageCollector
     */
    public static LanguageCollector toLanguage() {
      return new LanguageCollector();
    }
  }

I have modified your classes at little bit to (to follow the naming convention and more for readable accumulator operation).

Class Item

public class Item {
    private String name;
    private String languageName;

    public Item(String name, String languageName) {
      this.name = name;
      this.languageName = languageName;
    }
    //Getter and Setter
  }

Class Language

public class Language {
    private String languageName;
    private int numberItems;
    private LinkedList<String> items;

    public Language(String languageName, int numberItems) {
      this.languageName = languageName;
      this.numberItems = numberItems;
      items = new LinkedList<>();
    }

    public void add(String item) {
      items.add(item);
    }

    // Getter and Setter

    public String toString() {
      return "Language(languageName=" + this.getLanguageName() + ", numberItems=" + this.getNumberItems() + ", items=" + this.getItems() + ")";
    }
  }

Running code

public static void main(String[] args) {
    List<Item> items =
        Arrays.asList(
            new Item("ItemA", "Java"),
            new Item("ItemB", "Python"),
            new Item("ItemC", "Java"),
            new Item("ItemD", "Ruby"),
            new Item("ItemE", "Python"));

    Map<String, Language> languages = items.stream().collect(LanguageCollector.toLanguage());

    System.out.println(languages);
  }

prints

{Java=Language(languageName=Java, numberItems=2, items=[ItemA, ItemC]), Ruby=Language(languageName=Ruby, numberItems=1, items=[ItemD]), Python=Language(languageName=Python, numberItems=2, items=[ItemB, ItemE])}

For more information please read book 'Modern Java in Action: Lambdas, streams, functional and reactive programming' chapter 6.5 or check this link

Don’t specify the `CONCURRENT` characteristic when your collector uses `HashMap`. The class `HashMap` is not thread safe and your collector is not a concurrent collector. Besides that, the merge function is unnecessarily complicated. It could be just `(map1, map2) -> { map1.putAll(map2); return map1; }`. — Holger, Aug 11 '20 at 14:08
But this all looks very much like an already existing builtin collector, so I suggests to reconsider whether it really is impossible with a builtin collector: `Collectors.toMap(item -> item.getLanguageName(), item -> new Language(item.getLanguageName(), 1, new ArrayList<>(Arrays.asList(item))), (lang1, lang2) -> { lang1.getItems().addAll(lang2.getItems()); return lang1; })` or `Collectors.groupingBy(item -> item.getLanguageName(), Collectors.collectingAndThen(Collectors.toList(), list -> new Language(list.get(0).getLanguageName(), list.size(), list)))` — Holger, Aug 11 '20 at 14:11
Thanks @Holger for going through long post :-). I agree with you on comment 1 and made changes accordingly. About comment 2, let me try and check — Pankaj, Aug 13 '20 at 07:19

Optimize insertion from ArrayList to HashMap

2 Answers2