0

I've never had occasion to write a hashcode function in Java but now I have a need to do so. How do I go about it?

It's for an ArrayList and each element contains 5 Strings and nothing else.

I found an example for an ArrayList that contains 2 string and it's very simple:

return 31 * lastName.hashCode() + firstName.hashCode();

Can I get away with something equally simple, namely:

return 31 * field1.hashcode() + field2.hashcode() + field3.hashcode() + field4.hashcode() + field5.hashcode();

Or does a hashcode() method have further requirements?

I found another StackOverflow discussion of hashcode() here: Best implementation for hashCode method

From that, I imitated one of the answers and came up with this:

return Objects.hash(this.mClientCode, this.mOrderNumber, this.mOrderDate, this.mTicketsSold, this.mSellerName);

Is that better than the first one I suggested? Why?

Since hashcode() and equals() should apparently always get changed at the same time, this is my equals():

   public boolean equals(Object o) {

    if (!(o instanceof SalesItem)) {
        return false;
    }

    SalesItem n = (SalesItem) o;

    return n.mClientCode.equals(mClientCode) && n.mOrderNumber.equals(mOrderNumber) &&
            n.mOrderDate.equals(mOrderDate) && n.mTicketsSold.equals(mTicketsSold) &&
            n.mSellerName.equals(mSellerName);
}

Does that look okay?

Henry
  • 1,395
  • 1
  • 12
  • 31

2 Answers2

1

Your equals is almost right. If none of those values can be null, its good. If they can be, then you need to add null checks as well- if((n.lastName!= null && n.lastName.equals(lastname)) || (n.lastName == null && lastname == null)) and repeat for the others.

For the hash- what you want is the has to be as randomly distributed as possible and unique for the values you would consider unique. Its hard for us to tell you a good hash algorithm because we don't know how your data structure is used. For example, if there's only 4 sellers, you'd want that field to be a very small factor in the hash, if a factor at all.

Is this a representation of a database row? It looks like one. Because if it is, the rowId or a UUID for the row would be the best thing to hash.

Gabe Sechan
  • 90,003
  • 9
  • 87
  • 127
  • `Objects.equals` is typically preferable to complicated inline comparisons. – chrylis -cautiouslyoptimistic- Aug 01 '18 at 05:19
  • @chrylis Objects.equal(lastname, n.lastname) is equivalent to lastname.equals(n.lastname), but requires an additional function call with no additional clarity and actually more wordy, and doesn't allow you to control whether to accept nulls. I wouldn't reject a code review that did it, but I find there's between 0 and slightly negative value in adding it – Gabe Sechan Aug 01 '18 at 05:25
  • Either nulls should be permitted at this class level or they shouldn't; in the case where they are, you have to check them, and `Objects.equals` does that inline for you. – chrylis -cautiouslyoptimistic- Aug 01 '18 at 05:47
  • @chrylis But if nulls aren't allowed, then I don't want the performance hit of checking them. Which adds up, especially if you're called in a loop- there's a reason why many IDEs ask if you want to add the null checks when generating an equals function. And quite truthfully- I have no idea if Objects.equals checks of not. I'd need to look up how it handles nulls (I assume you're right, but if I was reading it in code I'd need to check the docs). Which is another reason not to use it- I'd rather have it explicit. – Gabe Sechan Aug 01 '18 at 05:50
  • @Gabe Sechan Yes, this ArrayList imitates a MySQL table. I've defined the table so that the primary key is the combination of ClientCode and OrderNumber and prohibited nulls in all five columns of the table. However, this ArrayList is used in an Android app that is going to insert, update, and delete the rows that go into the table. I *think* I am precluding insert and update logic inserting any nulls into the ArrayList and table (or changing existing non-nulls to nulls) so any nulls that get into the ArrayList are going to be due to sloppy coding on my part. ;-) – Henry Aug 01 '18 at 18:42
  • @Henry Since you have a primary key, and primary keys are unique, I would only include those two values in the hash. The other values shouldn't matter. In fact, you probably wouldn't want them in- if you update the ticketsSold you still want it to hash to the same value, so it would go to the same bucket in a hashmap. Generally you want any field you put into the hash to be immutable, by convention if not actually forcing it. – Gabe Sechan Aug 01 '18 at 18:47
  • @Gabe Sechan (Ran out of room in previous comment). So, under those circumstances, are you saying the hash should only be based on the ClientCode and OrderNumber? ClientCode will be limited to about 8 values although that might grow slightly. OrderNumber will keep growing. It's a 6 digit number, the lowest of which is 100000 and I suspect it'll take decades to grow over 6 digits; currently, the highest OrderNumber for the oldest client is near 250 000. – Henry Aug 01 '18 at 18:48
  • @Henry Use the hash of the order number, rather than the direct numerical value. But yes. I'd actually question though- is the primary key really the pair of fields, or is it just the order number? DO you really have cases where you have 2 rows with the same order number but different customer codes? – Gabe Sechan Aug 01 '18 at 18:50
  • @Henry I'm actually going to backtrack on the equals function a bit too- if you were to compare two objects with the same primary key values (order# and client code)- would you want them to be considered equal by an algorithm that goes and checks equality. Like say an algorithm to find an item in a list. If you would, then your equals function should only include those fields. If you wouldn't, then add them all. That's more a "how are you using this object" question. – Gabe Sechan Aug 01 '18 at 18:55
  • @Gabe Sechan I don't know for a fact that we have the same order number on orders that have different client codes but I see no reason why it couldn't happen. We essentially have a fully separate system for each client and I'm going to be surprised if they all get their order numbers from the same pool. (I can't tell since it's a purchased system that I can't look inside.) It's far safer to assume that an order number is only unique within a given client code than that it is unique across all client codes. – Henry Aug 01 '18 at 18:57
  • @Henry Ok, if it would make logical sense in your system to have that possibility then it really is a two field primary key and you need both in your hash. – Gabe Sechan Aug 01 '18 at 18:59
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/177222/discussion-between-henry-and-gabe-sechan). – Henry Aug 01 '18 at 18:59
  • @Gabe Sechan I guess you didn't want to get dragged into a chat ;-) I just wanted to confirm a couple of things and conclude this. Based on our discussion here, I've modified my hashcode() method to this: return 31*mOrderNumber.hashCode() + mClientCode.hashCode(); Let me know if that's wrong. – Henry Aug 01 '18 at 19:26
0

You can also use the HashCodeBuilder in the org.apache.commons.lang3 library.

Here is the documentation and an example:
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/builder/HashCodeBuilder.html

René
  • 141
  • 1
  • 6