0

I am working with Cassandra and using Datastax Java driver. I am trying to reuse prepared statements by caching it.

  private static final ConcurrentHashMap<String, PreparedStatement> holder = new ConcurrentHashMap<>();

  public BoundStatement getStatement(String cql) {
    Session session = TestUtils.getInstance().getSession();
    PreparedStatement ps = holder.get(cql);
    // no statement is cached, create one and cache it now.
    if (ps == null) {
      holder.putIfAbsent(cql, session.prepare(cql));
    }
    return ps.bind();
  }

Prepared Statement and BoundStatement of datastax java driver.

This getStatement method will be called by multiple threads so I have to make sure it is thread safe. I am working with Java 7.

What will putIfAbsent do here if we get two same cql prepared statements? Is my code thread safe and there is no race condition?

Update:-

  public BoundStatement getStatement(String cql) {
    Session session = TestUtils.getInstance().getSession();
    PreparedStatement ps = holder.get(cql);
    // no statement is cached, create one and cache it now.
    if (ps == null) {
      synchronized (this) {
        ps = holder.get(cql);
        if (ps == null) {
          ps = session.prepare(cql);
          holder.put(cql, ps);
        }
      }
    }
    return ps.bind();
  }
john
  • 11,311
  • 40
  • 131
  • 251

2 Answers2

1

Your code has a race condition which can result in session.prepare(cql) being called twice (or more) for any cql parameter. The putIfAbsent doesn't really offer any advantage over a normal put in this case.

If you were on Java 8, you could write this efficiently without creating duplicates with

PreparedStatement ps = holder.computeIfAbsent(cql, key -> session.prepare(key));
Kayaman
  • 72,141
  • 5
  • 83
  • 121
  • If anyone has a great usecase for `putIfAbsent`, I'd like to hear it. – Kayaman Nov 29 '16 at 19:46
  • I am still on Java 7. What can I do to avoid this issue with Java 7? – john Nov 29 '16 at 19:57
  • @david To be honest I don't have a very elegant solution for Java 7. You'd need to manually lock the `get / check for null / put` part. – Kayaman Nov 29 '16 at 20:12
  • I see, can you provide an example for that if possible. That will help me to understand how to lock get and put in a atomic way. – john Nov 29 '16 at 20:18
  • @david Well, with a `Lock` for example, or isolate the operations in a `synchronized` method. Whatever to make it atomic. – Kayaman Nov 29 '16 at 20:19
  • It’s hard to say what the correct solution is, when there is so little context. If this method is called by different threads with the same string, it might return the same statement. So the first question is, whether this is correct, i.e. whether the statement works when being executed by different threads at the same time. Compared to that, the potential impact of having two different statement instances for the same string, is ridiculously low. You still can enforce the use of the same statement instance by using the result of `putIfAbsent`, if non-`null`. – Holger Nov 29 '16 at 20:58
  • `putIfAbsent` is useful, whenever it is acceptable to temporarily calculate the same value/ create an equivalent instance, as long as the final result will be the canonical value stored in the map or that potential duplicate calculation has a minor performance impact as concurrent invocations with the same key rarely happen. – Holger Nov 29 '16 at 21:01
  • @Holger Yeah. Less obvious than with `computeIfAbsent` though. – Kayaman Nov 29 '16 at 21:17
  • Of course, `computeIfAbsent` is much easier to use, but `putIfAbsent` exists since Java 5, created at a time when the necessary features didn’t exist to allow `computeIfAbsent`, no functional interfaces nor no lambda expressions and no support for bucket locking within the CHM implementation… – Holger Nov 29 '16 at 21:22
1

You indeed do have a race condition, but your putIfAbsent is still probably slightly better than pure put, as it keeps the old statement, so there are never two instances in real use. The advantage is tiny as it manifests only in case of a race condition.

It looks like a version of a Double-checked locking. Just put a synchronized block around the initialization a recheck if it's really needed.

I'm rather sceptical about caching the PreparedStatement as it's mutable. You may need multiple instances for the same cql. This is doable, too, but you'd need to acquire and release them properly.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • My idea of caching the prepared statement came from [this](http://stackoverflow.com/questions/22915840/re-using-preparedstatement-when-using-datastax-cassandra-driver) otherwise it will keep giving warning every time. – john Nov 29 '16 at 21:18
  • I have updated the question with synchronized block. That's what you meant? – john Nov 29 '16 at 21:22
  • @david Yes, the synchronized block looks fine. Concerning the mutability, [this answer](http://stackoverflow.com/a/22917891/581205) looks good (I know nothing about Cassandra). Btw., your original race condition is acceptable as all you risk is getting the efficiency warning. – maaartinus Nov 29 '16 at 21:33