1

I am using XTDB 1.21.0 deployed on AWS/ECS (Fargate) with checkpoints configured (frequency 30 minutes) and stored on an S3 bucket (RocksDB). After a couple of successful checkpoints, they seem to be constantly failing with an XTDB warning due to an exception in the HTTP request to AWS, as shown below:enter image description here

This leaves the S3 buckets with incomplete checkpoints (i.e., a Folder containing a set of SSTs and other RocksDB files and no associated EDN index file): enter image description here

XTDB documentation mentions the fact that an optional S3configurator can be passed to the node configuration and after a bit of Googling around I figured that makeClient should be overridden so that connectionAcquisitionTimeout can be set:

NettyNioAsyncHttpClient.builder()
.maxConcurrency(200)
.connectionAcquisitionTimeout(Duration.ofMillis(20000))

I am not too familiar with NETTY so would appreciate if someone could help with the right incantation.

Also I am configuring the XT node from an EDN file, and haven't figure out how to write a S3 configurator in an EDN file (or if it is even possible).

Thanks in advance!

modality
  • 21
  • 2

1 Answers1

0

This can happen for large datasets where the default S3 client used will create a new async request for each object (for which the number of objects may be very large, particularly if using the RockDBs index). Internally it uses the connectionAcquisitionTimeout as a type of backpressure to ensure that incoming requests don't wait indefinitely for a connection from the connection pool, however, in this case we're the only source of these requests and we definitely want the requests to complete before starting the nodes so it's reasonable to set the connectionAcquisitionTimeout to something very high (the default is only 10 seconds). A good choice of limit might be something like the maximum amount of time you want to wait for the node to start before failing.

This appears to be a non-optional parameter of the SDK for what I can only assume is a sensible default strategy for requests coming from an external source, in our case we essentially want it to behave as if it was a synchronous operation.

Configuring this in Clojure with xtdb would look something like this:

(ns foo.db
  (:require
   [xtdb.api :as xtdb]
   [xtdb.checkpoint]
   [xtdb.rocksdb]
   [xtdb.s3.checkpoint])
  (:import
   (java.time Duration)
   (software.amazon.awssdk.http.nio.netty NettyNioAsyncHttpClient)
   (software.amazon.awssdk.services.s3 S3AsyncClient)
   (xtdb.checkpoint Checkpointer)
   (xtdb.s3 S3Configurator)))

(def s3-configurator
  (reify S3Configurator
    (makeClient [this]
      (.. (S3AsyncClient/builder)
          (httpClientBuilder
           (.. (NettyNioAsyncHttpClient/builder)
               (connectionAcquisitionTimeout
                (Duration/ofSeconds 600)) ;; Set a high limit here

               ;; We can rely on the defaults for maxConcurrency and
               ;; maxPendingConnectionAcquires
               ;; (maxConcurrency (Integer. 200))
               ;; (maxPendingConnectionAcquires (Integer. 10000))

               ))
          (build)))))

(defn start-node!
  []
  (xtdb/start-node
    {:xtdb/index-store
     {:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
                 :db-dir "/var/xtdb/idxs"
                 :checkpointer {:xtdb/module 'xtdb.checkpoint/->checkpointer
                                :store {:xtdb/module 'xtdb.s3.checkpoint/->cp-store
                                        :configurator (constantly s3-configurator)
                                        :bucket "checkpoints"}
                                :approx-frequency "PT3H"}}}}))
Tim Greene
  • 83
  • 4