I have finished some code in spark notebook, I tried to move it into a real project, and use sbt
to generate a jar, then use the spark-submit
to execute it.
Problem: It takes just 10 minutes to get the result in spark notebooks, but it takes almost 3 hours to get the result when I use the command spark-submit.
Info: I configured the spark, scala version, and parameters(master url, execution core/memory, etc.) are all the same between notebooks and spark-submit.
Suspect 1: maybe because of the logs(LogFactory.getLog().info("xxxx"))? which make the program take too time to print them or save them ?
Suspect 2: maybe because of the code? I didn't do any big changes to the code of notebook, just create a function, put the code inside and run it. Should I do some partitions or something?