I am looking at kudu's documentation.
Below is a partial description of kudu-spark.
https://kudu.apache.org/docs/developing.html#_avoid_multiple_kudu_clients_per_cluster
Avoid multiple Kudu clients per cluster.
One common Kudu-Spark coding error is instantiating extra
KuduClient
objects. In kudu-spark, aKuduClient
is owned by theKuduContext
. Spark application code should not create anotherKuduClient
connecting to the same cluster. Instead, application code should use theKuduContext
to access aKuduClient
usingKuduContext#syncClient
.To diagnose multiple
KuduClient
instances in a Spark job, look for signs in the logs of the master being overloaded by manyGetTableLocations
orGetTabletLocations
requests coming from different clients, usually around the same time. This symptom is especially likely in Spark Streaming code, where creating aKuduClient
per task will result in periodic waves of master requests from new clients.
Does this mean that I can only run one kudu-spark task at a time?
If I have a spark-streaming program that is always writing data to the kudu, How can I connect to kudu with other spark programs?