I would like to take a string representing options to a spark-submit command and format them with --conf
interspersed between the options. This
concatConf :: String -> String
concatConf = foldl (\acc c -> acc ++ " --conf " ++ c) "" . words
works for most collections of options, e.g.,
λ => concatConf "spark.yarn.memoryOverhead=3g spark.default.parallelism=1000 spark.yarn.executor.memoryOverhead=2000"
" --conf spark.yarn.memoryOverhead=3g --conf spark.default.parallelism=1000 --conf spark.yarn.executor.memoryOverhead=2000"
But on occasion there can be spark.executor.extraJavaOptions
, which is a space-separated, escaped-quote enclosed, list of additional options; for example,
"spark.yarn.memoryOverhead=3g spark.executor.extraJavaOptions=\"-verbose:gc -XX:+UseSerialGC -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy\" spark.default.parallelism=1000 spark.yarn.executor.memoryOverhead=2000"
and the concatConf
function above obviously breaks down.
The following function, using the regex-compat
library works for this example
import Data.Monoid (<>)
import Text.Regex (mkRegex, matchRegexAll)
concatConf :: String -> String
concatConf conf = let regex = mkRegex "(\\ *.*extraJavaOptions=\\\".*\\\")"
in case matchRegexAll regex conf of
Just (x, y, z, _) -> (insConf x) <> " --conf " <> y <> (insConf z)
Nothing -> ""
where insConf = foldl (\acc c -> acc ++ " --conf " ++ c) "" . words
until you figure out that there's a similar spark.driver.extraJavaOptions
that comes in a similar format. In any case, this function doesn't work for when there isn't such an option. Now I'm struggling with many cases: where there is none or one or both of these, which one appears first in the string if it's there, etc.
This sort of makes me feel like regex isn't the right tool for the job, hence my question, what is the right tool for this job?