2

I need to find where Samza on YARN places its KV state stores. I suspect it is in the YARN local application directory as all YARN applications but I believe it is configurable as I did this a few months back (mapped folder to memory) in a different environment but don't recall now.

For that to be possible I need to be able to separate the samza KV stores from other YARN application data of other applications.

Billal Begueradj
  • 20,717
  • 43
  • 112
  • 130
Edi Bice
  • 566
  • 6
  • 18

2 Answers2

3

Here's the solution. It was printed in the Samza job log output:

[WARN] No override was provided for logged store base directory. This disables local state re-use on application restart. If you want to enable this feature, set LOGGED_STORE_BASE_DIR as an environment variable in all machines running the Samza container

LOGGED_STORE_BASE_DIR can be set as part of the NodeManager startup. For example:

# Typical environment setup.
export JAVA_HOME=...
export YARN_CONF_DIR=...
export YARN_LOG_DIR=...
export HADOOP_LOG_DIR=...
export YARN_MASTER=...
export YARN_PID_DIR=...
export YARN_IDENT_STRING=...
export YARN_NICENESS=...
export YARN_OPTS="-XX:+UseG1GC -XX:ErrorFile=logs/hs_err.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:ErrorFile=logs/hs_err.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:logs/gc.log"

# Location of samza-kv stores for host affinity (should be on an SSD).
export LOGGED_STORE_BASE_DIR="/mnt/myssd/samza/logged-stores"

# Startup the Yarn NodeManager
./yarn-daemon.sh" --config "$YARN_CONF_DIR" nodemanager
Jon Bringhurst
  • 1,340
  • 1
  • 10
  • 21
Edi Bice
  • 566
  • 6
  • 18
1

The store path is only configurable if the store has changelog enabled.

The store location is controlled by the environment variable LOGGED_STORE_BASE_DIR

More detail is provided here: http://samza.apache.org/learn/documentation/0.11/yarn/yarn-host-affinity.html

JMakes
  • 61
  • 1
  • 2