20

I am new to java and practicing by creating a simplistic NaiveBayes classifier. I am still new to object instantiation, and wonder what to do to initialize a HashMap of HashMaps. When inserting new observations into the classifier, I can create a new HashMap for an unseen feature name in a given class, but do I need to initialize?

import java.util.HashMap;

public class NaiveBayes {

    private HashMap<String, Integer> class_counts;
    private HashMap<String, HashMap<String, Integer>> class_feature_counts;

    public NaiveBayes() {
        class_counts = new HashMap<String, Integer>();
        // do I need to initialize class_feature_counts?
    }

    public void insert() {
        // todo
        // I think I can create new hashmaps on the fly here for class_feature_counts
    }

    public String classify() {
        // stub 
        return "";
    }

    // Naive Scoring:
    // p( c | f_1, ... f_n) =~ p(c) * p(f_1|c) ... * p(f_n|c)
    private double get_score(String category, HashMap features) {
       // stub
       return 0.0;
    }

    public static void main(String[] args) {
        NaiveBayes bayes = new NaiveBayes();
       // todo
     }
}

Note this question is not specific to Naive Bayes classifiers, just thought I would provide some context.

dimo414
  • 47,227
  • 18
  • 148
  • 244
David Williams
  • 8,388
  • 23
  • 83
  • 171
  • 1
    "_...do I need to initialize?_" **Yes**. – jlordo Mar 25 '13 at 22:49
  • 3
    And a HashMap of HashMaps is usually a sign that you lacks objects and encapsulation. – JB Nizet Mar 25 '13 at 22:52
  • Cool, I can appreciate that. Do you have some advice? The way I have been thinking about this is having a two level hash, so for example, if this was spam detection I could have `{ spam : { "bank account" : 3, "viagra" : 9 }` What are your thoughts? – David Williams Mar 26 '13 at 00:44

4 Answers4

24

Yes, you need to initialize it.

class_feature_counts = new HashMap<String, HashMap<String, Integer>>();

When you want to add a value to class_feature_counts, you need to instantiate it too:

HashMap<String, Integer> val = new HashMap<String, Integer>();
// Do what you want to do with val
class_feature_counts.put("myKey", val);
BobTheBuilder
  • 18,858
  • 6
  • 40
  • 61
  • 1
    Hi @BobTheBuilder, I have a question regarding what to do after this point. Suppose in the above example I was to then retrieve my HashMap that was stored in class_feature_counts under the key "MyKey". I would do `class_feature_count.get("MyKey")`. However, that returns an "Object" object, not a "HashMap" object. How do I cast that Object object to HashMap? Thanks. – kranberry Sep 05 '17 at 04:13
  • 1
    You should use: `HashMap> nestedAdBook = new HashMap>();` – BobTheBuilder Sep 25 '17 at 15:39
  • @BobTheBuilder is this still true in 2017 with Java 8 / 9? Do inner types in a Map need to be instantiated? Such as `Map>`. Is it safe to add to the list right away or not? – fIwJlxSzApHEZIl Oct 17 '17 at 23:21
  • 1
    You still need to instantiate it. – BobTheBuilder Oct 18 '17 at 07:50
13

Recursive generic data structures, like maps of maps, while not an outright bad idea, are often indicative of something you could refactor - the inner map often could be a first order object (with named fields or an internal map), rather than simply a map. You'll still have to initialize these inner objects, but it often is a much cleaner, clearer way to develop.

For instance, if you have a Map<A,Map<B,C>> you're often really storing a map of A to Thing, but the way Thing is being stored is coincidentally a map. You'll often find it cleaner and easier to hide the fact that Thing is a map, and instead store a mapping of Map<A,Thing> where thing is defined as:

public class Thing {
    // Map is guaranteed to be initialized if a Thing exists
    private Map<B,C> data = new Map<B,C>();

    // operations on data, like get and put
    // now can have sanity checks you couldn't enforce when the map was public
}

Also, look into Guava's Mulitmap/Multiset utilities, they're very useful for cases like this, in particular they do the inner-object initializations automatically. Of note for your case, just about any time you implement Map<E, Integer> you really want a Guava Multiset. Cleaner and clearer.

dimo414
  • 47,227
  • 18
  • 148
  • 244
  • Thanks for the tip, I will have a look at Guava's multisets. About recursive generic structures, what would you recommend? I know that I want to have a fixed, 2 level hash of hashes, as in the spam detection comment above. Would you recommend creating something like a featureCount class and then create a HashMap of those? – David Williams Mar 26 '13 at 00:48
  • 1
    Yes, that's the general idea, updated my answer. Guava helps solve a ton of common Java problems, definitely check it out. – dimo414 Mar 26 '13 at 01:37
  • 1
    Just Googled "java HashMap of HashMap" in hopes of a suggestion for a less awkward design. Now that I see this, it's kind of a "duh" moment, but this is perfect. Thank you! – sfarbota Feb 23 '14 at 19:50
2

You must create an object before using it via a reference variable. It doesn't matter how complex that object is. You aren't required to initialize it in the constructor, although that is the most common case. Depending on your needs, you might want to use "lazy initialization" instead.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
2
  1. Do not declare your variables with HashMap. It's too limiting.
  2. Yes, you need to initialize class_feature_counts. You'll be adding entries to it, so it has to be a valid map. In fact, initialize both at declaration and not in the constructor since there is only one way for each to start. I hope you're using Java 7 by now; it's simpler this way.

    private Map< String, Integer> classCounts = new HashMap<>();

    private Map< String, Map< String, Integer>> classFeatureCounts = new HashMap<>();

The compiler will deduce the types from the <>. Also, I changed the variable names to standard Java camel-case style. Are classCounts and classFeatureCounts connected?

Eric Jablow
  • 7,874
  • 2
  • 22
  • 29
  • In some sense yes, they are connected, they each hold information about counts of occurrences of certain strings, for example, see chapter 6 of [Programming Collective Intelligence](http://shop.oreilly.com/product/9780596529321.do). Thanks for the heads up about camel casing. – David Williams Mar 26 '13 at 00:27