1

I have made an intern pool in java using the same idea as string intern. Simply put, I maintain a

WeakHashMap<T, T>

Every time when the map contains the object, it will return the same object, the benefit is it will save java heap memory. For example, I have a Person class like this:

public Person() {
    String name;
    int age;
    String employer;

    @Override
    public equals(Pbject obj) {
        ......
    }

    @Override
    public hashCode() {
        ......
    }
}

It doesn't have a field to make the class unique (No primary key). The problem is when I want to check if the map contains a specific person, I will need to create a temporary person first so that the map.contains() method can call the equals() method for this person. As a result, after I run profiler to see the memory usage, I can see that GC has collected a lot of temporary objects, it will surely result in more GC and CPU usage. Is there a way that we can implement the intern pool idea without creating too many temporary objects?

p.s. I got the intern pool idea from this post: Generic InternPool<T> in Java?

Yiwei
  • 89
  • 8
  • Can you keep the one temporary object you used for checking with `contains()` and reuse it? – Progman May 09 '19 at 20:06
  • How can I keep the temporary object? – Yiwei May 09 '19 at 20:07
  • You can save it in a field, maybe a static one. But that depends on the code and requirement you have. – Progman May 09 '19 at 20:10
  • if you need to create another dummy object to check if an object is in the hashmap, then you need to check your code again. What are you checking in the equals() method( I guess you are overriding it)? – aran May 09 '19 at 20:12
  • My requirement is to save heap memory by reducing duplicate objects, by keeping temporary objects, I think it will bloat the memory usage – Yiwei May 09 '19 at 20:12
  • @aran Yes, I override the equals for the class, so I can see if the object is in the map already – Yiwei May 09 '19 at 20:17
  • but what are you checking in that method? – aran May 09 '19 at 20:17
  • Checking that every field in the object is the same to make sure that two objects are same – Yiwei May 09 '19 at 20:18
  • does it make logic if you create a hash with your variables and put that hash as the hashmap key? That would be the unique ID of each object. Although I'm just guessing, do not know the details.. – aran May 09 '19 at 20:20
  • The garbage collector optimizes for quick cleanup of "young" objects since it is such a common occurrence to quickly create and discard lots of temporaries. See https://codeahoy.com/2017/08/06/basics-of-java-garbage-collection/ for details. – John Kugelman May 09 '19 at 20:21
  • @aran Yeah the only thing is what if two objects have the same hash value – Yiwei May 09 '19 at 20:23
  • Then it will be overriden in the hashmap, keeping just the last instance of the object(as the keys are the same) – aran May 09 '19 at 20:23
  • @aran That's not what I want, I don't want to override the previous object – Yiwei May 09 '19 at 20:25
  • so then, you check if the hash exists in the map's keyset, and if it does, you return its value, by calling "yourMap.get(1029182190)" -- (as an example of hash) – aran May 09 '19 at 20:26
  • @aran But what if the new object is not actually in the map, but has the same hash with an object in the map. – Yiwei May 09 '19 at 20:29
  • Then that's a duplicate, so you don't store nor create it. I don't know where you gonna go... What i mean is, don't create an object to check, just get its variable values (for example, a string db connection's hash instead of a Connection object) and check them as the unique ID they are – aran May 09 '19 at 20:29
  • No I'm just saying If two objects have the same hashcode then they are NOT necessarily equal – Yiwei May 09 '19 at 20:31
  • Then you are missing something: that's the reason behind a pool, and the reason hashes exist. If two Connection object instances share the same connection db, then they are not the same instance/object, OK, BUT THEY SHOULD ACT EQUALLY, that's what you should care for. If you use the last one, it will act equally as if you create the new one, so yes, the string db hash is the only thing you need to know to create or not a new resource in the pool. If two objects whose hashes are equal aren't "equal" for your program logic, there's something missing there. – aran May 09 '19 at 20:33
  • I think you are talking about the object pool design pattern, but for the intern pool purpose it is different, for the intern pool, every object does not act equally, they are just totally different objects. – Yiwei May 09 '19 at 20:37
  • 1
    @aran Different objects can have the same hashcode or hash. – Progman May 09 '19 at 20:38
  • They don't if you create your hash correctly. For example a pool of TCP connections to connect to local ports (to avoid host and simplify). If inside the Connection object you create a "int port" variable, that's your hash. You store it as the key of the hashmap, and NO, no other type of pool access will share the same hash as your key. The number of the port is the ONLY thing you need to know if the resource you are trying to create in the pool alredy exists Don't create a Connection object and use its internal check method, just check if a resource accessing port 5 (5)exists in the keyset!! – aran May 09 '19 at 20:43
  • @aran I totally understand that, but the intern pool here serves a different purpose. For example, I have a simple class Person{int, age, String name, String job}, I wanna make sure in the application there are no duplicate Persons, then I use the intern pool to intern the objects, and use the equals method to compare the new object and the object in map. – Yiwei May 09 '19 at 20:51
  • Why not use an `IdentityHashMap` instead, which won't call `equals()`? – Jacob G. May 09 '19 at 21:02
  • " The problem is when I want to check if the map contains the object, I will need to create a temporary object first ...". So just how does the String intern work? It certainly can't magically give you the same object without first construction something with which to compare. It doesn't seem obvious to me how you can really emulate the intern capability to do what you want with a HashMap. You may want to look at `weak references`. – WJS May 09 '19 at 21:13
  • @aran You're off base. By the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle) hashes can and will collide. Your port number example is constructed to avoid the issue; as you admit, you excluded hostnames because they'd make it impossible to have unique hashes. – John Kugelman May 10 '19 at 11:39
  • @JohnKugelman you are not getting the point. Of course they will collide, and if they do (two file sends to port 5 will share the same key), then that's it, you already got a port 5 connection in there, you don't create a new one. That's how pools work, and no, my intention is not creating hashes that DON'T COLLIDE, because then you don't have a pool, you have a dummy thing that creates a new connnection to port 5 resource again, even if it already existed and it's avaliable for you. Hashes MUST collide in order to avoid duplicates, and to check if the resource is already avaliable! – aran May 10 '19 at 11:59
  • @JohnKugelman I will add the host and the example is the same. Connection to serverA on port 5 will have the key serverA_5. If any other thread/resource/whatever wants to know if there's a connection to serverA on port 5, why in hell you create a new object and check via its internal equals() method if host and port are the same, when the only thing you need to check is the hash key, without creating any temp object? If someone asks for serverA_5 is because he wants that connection, no other connection but "host: ServerA; port:5" will ever share that key. Where is the problem in this logic? – aran May 10 '19 at 12:15
  • @aran For dataconnnection object, you could probably do that. What about for any other class, take the person class I mentioned before as an example. You cannot simply compare the hash – Yiwei May 10 '19 at 12:48
  • @Yiwei simply adding a variable with its ID number/Driving License number/Security card number/Telephone number ... would work, any of them should be unique for each person. You won't need to create another Person class and check if the overrided equals() tells if they are the same, just check if that (f.e) telephone number already exists in the keyset. The idea is to get just a mix of your variable values that tell you, this is a duplicate, this already exists. Don't know if I am explaining myself, sorry for that. – aran May 10 '19 at 13:34
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/193174/discussion-on-question-by-yiwei-java-intern-pool-implementation-creates-too-many). – Bhargav Rao May 11 '19 at 00:57

2 Answers2

2

A HashMap<T, T> is not a Map but a done something very weird to your equals() and hashCode() methods. As commented, creating a temporary short lived instance is cheap because the garbage collector uses generations. But what you must check for existence is the key not the object itself.

Serg M Ten
  • 5,568
  • 4
  • 25
  • 48
  • 1
    I agree, but for the intern pool, internally the key and value are the same objects – Yiwei May 09 '19 at 20:56
  • In a pool, each object is essentially indistingable from any other. String interning and pooling are two different concepts. It is true that some compilers including Java do create a string intern pool. What's the exact nature of your objects? – Serg M Ten May 09 '19 at 21:11
  • I got the idea from this thread: https://stackoverflow.com/questions/3323807/generic-internpoolt-in-java. Basically, we can use the pool for any objects. In my application, after I run the profiler, I can see a lot of duplicate objects, that's why I started using an intern pool – Yiwei May 09 '19 at 21:32
  • Let's say that you have, for example a `LogFileWriter` class for which there is an instance pero log file. You want to use new instances of `LogFileWriter` only if a previous `LogFileWriter` pointing to the same file does not exist. Then you have two options: a `HashMap` where String is the full file path or `Set` where `LogFileWriter.equals()` and `LogFileWriter.hashCode()` have been overriden to compare LogFileWriters by their string paths. The use of weak references in the linked answer I do not know what for they are suggesting it. – Serg M Ten May 10 '19 at 04:59
  • 1
    @SergMTen A `Map` lets you look up the canonical reference to an object, whereas a `Set` will tell you that you've interned the object but not give an easy way to get the interned copy. The OP is correct to use a `Map`. – John Kugelman May 10 '19 at 11:34
0

I don't know what you're doing to use up a lot of heap memory but why not look at weak and strong references.

  • Weak references to a single object are kept around as long as there is a single hard reference. When the hard reference is garbage collected, so are the weak references.
  • Strong references will only go away when you start to run out of heap space. This would happen as you want to use more heap space for other objects. The older ones are GC'd.

So depending on what you are doing, you may want to check them out.

WJS
  • 36,363
  • 4
  • 24
  • 39