-1

We have server APIs to support clients running on ten millions devices. Normally clients call server once a day. That is about 116 clients seen per second. For each client (each with unique ID), it may make several APIs calls concurrently. Server then need to sequence those API calls from the same client. Because, those API calls will update the same document in the Mongodb database. For example: last seen time and other embedded documents.

Therefore, I need to create a synchronization mechanism based on client's unique Id. After some research, I found String Pool is appealing and easy to implement. But, someone made a comment that locking on String Pool may conflict with other library/module which also use it. And, therefore, String Pool should never be used for synchronization purpose. Is the statement true? Or should I implement my own "String Pool" by WeakHashMap as mentioned in the link below?

Good explanation of String Pool implementation in Java: http://java-performance.info/string-intern-in-java-6-7-8/

Article stating String Pool should not be use for synchronization: http://www.journaldev.com/1061/thread-safety-in-java

==================================

Thanks for BeeOnRope's suggestion, I will use Guava's Interner to explain the solution. This way, client that don't send multiple requests at the same time will not be blocked. In addition, it guarantees only one API request from one client is processed at the same time. By the way, we need to use a wrapper class as it's bad idea to lock on String object as explained by BeeOnRope and the link he provided in his answer.

public class Key {
    private String id;

    public Key(String id) {
        this.id = id;
    }

    public String getId() {
        return id;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ( (id == null) ? 0 : id.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null) return false;
        if (getClass() != obj.getClass()) return false;
        Key other = (Key)obj;
        if (id == null) {
            if (other.id != null) return false;
        } else if (!id.equals(other.id)) return false;
        return true;
    }
}

Interner<Key> myIdInterner = Interners.newWeakInterner();

public void processApi1(String clientUniqueId, RequestType1 request) {
    synchronized(myIdInterner.intern(new Key(clientUniqueId))) {
        // code to process request
    }
}

public void processApi2(String clientUniqueId, RequestType2 request) {
    synchronized(myIdInterner.intern(new Key(clientUniqueId))) {
        // code to process request
    }
}
Raymond
  • 115
  • 2
  • 11
  • "The string pool is implemented as a **fixed** capacity hash map with each bucket containing a list" There are also space limitations – bichito Jul 22 '17 at 00:13
  • 1
    `String s = "123";` and `String s = "12" + "3";` are not the same reference. Intern cache is a poor source for synchronization. – Elliott Frisch Jul 22 '17 at 00:14
  • @efekctive - in practice this doesn't affect the functionality of locking on interned strings, since it is only talking about the size of the array backing the hash for the interned strings, but the number of strings can still grown unboundedly since each array entry heads a list of an unlimited number of elements. – BeeOnRope Jul 24 '17 at 04:15
  • Your answer suggests interning the strings first, it will need to hold 10 million just for synchronization purposes + locking. Plus whoever else needs the pool. I think this is bad idea overall. – bichito Jul 24 '17 at 04:30
  • Some clarifications: 1.String.intern() returns the reference to the same object in String Pool. 2. Java 7/8 can garbage collect String Pool and bigger default hashtable size for better performance 3. Don't need 10M objects as they will be garbage collected. 4. String.intern() performance is good if hash table size is prime number (e.g. 1000003). In addition, I believe my question is valid and other people probably also encountered the same scenario. Not sure why I got so negative feedback? – Raymond Jul 24 '17 at 16:03
  • @efekctive - no, the interner is _weak_ so it only holds the IDs that are actually live in the system (for example, have a request in progress). Often the weak interner _saves_ memory (in terms of the live set), rather than uses it: it takes strings and returns a single object for all strings that are equal so it effectively de-duplicates them. There is some overhead for the internal map of the interner of course, but I assume in the scope of a typical request, a few dozen bytes isn't going to be material. – BeeOnRope Jul 24 '17 at 19:38
  • Just post the code. They way I understand the problem there will be 10 million live strings. Plus some. The string pool will be locked during synchronization 10 million times. Post the code and numbers. – bichito Jul 24 '17 at 19:47
  • Sure if there are 10 million _concurrent_ clients there are 10 million strings **and** 10 million session objects or 10 million whatever else is live during a connection. That's normal. Now I'm quite sure you are never supporting 10 million concurrent clients on a single host in Java, but hey. I'm pretty sure the OP is not going to post his company's code on SO. You can read right that the start that there are only 116 requests per second, so you can extrapolate that you are talking 10s or 100s of concurrent requests and not 10 million. – BeeOnRope Jul 24 '17 at 19:48

1 Answers1

2

Well if your strings are unique enough (e.g., generated via a cryptographic hash1) synchronizing on client IDs will probably work, as long as you call String.intern() on them first. Since the IDs are unique, you aren't likely to run into conflicts with other modules, unless you happen to pass your IDs in to them and they follow the bad practice of locking on them.

That said, it is probably a bad idea. In addition to the small chance of one day running into unnecessary contention if someone else locks on the same String instance, the main problem is that you have to intern() all your String objects, and this often suffers from poor performance because of the native implementation of the string intern table, it's fixed size, etc. If you really need to lock based only on a String, you are better off using Guava's Interners.newWeakInterner() interner implementation, which is likely to perform much better. Wrap your string in another class to avoid clashing on the built-in String lock. More details on that approach in this answer.

Besides that, there is often another natural object to lock on, such as a lock in a session object, etc.

This is quite similar to this question which has more fleshed out answers.


1 ... or, at a minimum, have at least have enough bits to make collision unlikely enough and if your client IDs aren't part of your attack surface.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • Thanks for the feedback. It's good to know that Guava has an implementation to intern any class. That will work. I just need to declare a class with a single string property of "Client's unique ID". That will put my mind in peace that it won't conflict with any other library which use String.intern(). BTW, if you check the String pool implementation link I provided, this guy used 1000003 (prime number?) as the hashtable size and get a very good performance with String.intern(). He also use WeakHashMap to implement his own with similar performance but much higher heap space needed. – Raymond Jul 24 '17 at 15:45
  • @Raymond - well of course `WeakHashMap` uses heap space since it is a java thing, while the string intern table is a "native" thing so it doesn't show up in the heap size, but it is still using native memory behind the covers! In some ways that is much worse since you don't have as much visibility into it. You can get weird cases like running out of memory or swapping while your heap size is still reasonable. – BeeOnRope Jul 24 '17 at 19:42
  • While `String.intern()` performance might be acceptable, I find it unlikely that it is better than `WeakInterner`. A nice advantage of WeakInterner is that it only deals with the strings you give it in this case client IDs. `String.intern()` relies on a globally shared map, and you can't control its lifetime, contention levels, etc. You'll have less contention with specific interners for specific purposes, and you get good "CHM-like" performance too. Finally, it's just plain Java so you can examine it, profile it, etc. – BeeOnRope Jul 24 '17 at 19:46
  • BTW, it occurs to me that you actually asking [this question](https://stackoverflow.com/questions/41898355/lock-handler-for-arbitrary-keys). You want a lock handler that allows you to lock on a `String` key. There are several solutions there with varying performance and complexity. – BeeOnRope Jul 24 '17 at 19:47