34

I'm building an Apache Spark Streaming application and cannot make it log to a file on the local filesystem when running it on YARN. How can achieve this?

I've set log4.properties file so that it can successfully write to a log file in /tmp directory on the local file system (shown below partially):

log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.File=/tmp/application.log
log4j.appender.file.append=false
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

When I run my Spark application locally by using the following command:

spark-submit --class myModule.myClass --master local[2] --deploy-mode client myApp.jar

It runs fine and I can see that log messages are written to /tmp/application.log on my local file system.

But when I run the same application via YARN, e.g.

spark-submit --class myModule.myClass --master yarn-client  --name "myModule" --total-executor-cores 1 --executor-memory 1g myApp.jar

or

spark-submit --class myModule.myClass --master yarn-cluster  --name "myModule" --total-executor-cores 1 --executor-memory 1g myApp.jar

I cannot see any /tmp/application.log on the local file system of the machine that runs YARN.

What am I missing.

Emre Sevinç
  • 8,211
  • 14
  • 64
  • 105
  • I just pasted your section of log4j.properties and ran it locally similarly to yours but it isn't creating any log file for my /tmp. am I missing something? – user1870400 Sep 12 '17 at 09:37
  • 1
    I found this post useful- https://stackoverflow.com/questions/27781187/how-to-stop-messages-displaying-on-spark-console – Rahul Sharma Feb 23 '18 at 16:25

5 Answers5

26

It looks like you'll need to append to the JVM arguments used when launching your tasks/jobs.

Try editing conf/spark-defaults.conf as described here

spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/apps/spark-1.2.0/conf/log4j.properties

spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/apps/spark-1.2.0/conf/log4j.properties

Alternatively try editing conf/spark-env.sh as described here to add the same JVM argument, although the entries in conf/spark-defaults.conf should work.

If you are still not getting any joy, you can explicitly pass the location of your log4j.properties file on the command line along with your spark-submit like this if the file is contained within your JAR file and in the root directory of your classpath

spark-submit --class sparky.MyApp --master spark://my.host.com:7077 --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-executor.properties" myapp.jar

If the file is not on your classpath use the file: prefix and full path like this

spark-submit ... --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/apps/spark-1.2.0/conf/log4j-executor.properties" ...
Shaido
  • 27,497
  • 23
  • 70
  • 73
Brad
  • 15,186
  • 11
  • 60
  • 74
  • I'm a little confused. My log4j.properties file is packaged into the JAR file. My application is a Maven project, and I'm creating single, big, self-contained JAR file that also has the log4j.properties file in it. This works fine when I run Spark locally. Is this (putting the log4j.properties into JAR) not possible when I run Spark on the YARN cluster? – Emre Sevinç Feb 11 '15 at 14:49
  • 2
    Yes, it should be possible. I'm trying to help simplify the problem by stripping down the configuration you're using. I'm not an expert in Spark. I have a stand alone cluster logging on a windows host. – Brad Feb 11 '15 at 14:58
  • @Emre I had a play around with the various settings mentioned in my answer, which led me to re-write it. I got the logging working by editing `spark-defaults.conf`, and then also got it working by using to `spark-submit --conf`. Either one or the other should work (you shouldn't need both) – Brad Feb 11 '15 at 17:09
  • Hi, Can the above be used for a log file that is not on the classpath nor contained within the jar? I've copied the log.properties to the executor and used the command line arg but it tells it it can't be found, thanks – null Oct 03 '16 at 09:49
  • Yes. The examples above using `=file:/apps/`are referencing an absolute path on disk and not a file on the class path or in the JAR – Brad Oct 04 '16 at 20:13
  • 1
    @EmreSevinç this will not work. Spark worker app is not your jar, but an application that run Spark code only sent from the driver, see my answer, you must use a static file (in HDFS or any other shared file-system, or via --files Spark directive). – Thomas Decaux Oct 10 '18 at 12:26
  • Many years later... I agree that the best solution is using `spark-submit --files "./log4j.properties" ...` where log4j.properties resides in the directory from where you execute this command. The properties file will be sent to the Driver and Executors – Brad Oct 10 '18 at 16:51
8

The above options of specifying the log4j.properties using spark.executor.extraJavaOptions, spark.driver.extraJavaOptions would only log it locally and also the log4.properties should be present locally on each node.

As specified in the https://spark.apache.org/docs/1.2.1/running-on-yarn.html documentation, you could alternatively upload log4j.properties along with your applicaiton using --files option. This would do yarn aggregate logging on HDFS and you can access the log using the command

yarn logs -applicationId <application id>
Chandra
  • 1,577
  • 3
  • 21
  • 28
  • do you have an example of a log4j.properties file to log to HDFS? – Irene Oct 22 '15 at 15:56
  • 1
    @Irene This is the only difference to the otherwise regular log4.properties file. **log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log** – Chandra Oct 28 '15 at 19:01
3

1) To debug how Spark on YARN is interpreting your log4j settings, use log4j.debug flag.

2) Spark will create 2 kind of YARN containers, the driver and the worker. So you want to share a file from where you submit the application with all containers (you cant use a file inside the JAR, since this is not the JAR that really run), so you must use the --files Spark submit directive (this will share file with all workers).

Like this:

spark-submit     
    --class com.X.datahub.djobi.Djobi \
    --files "./log4j.properties" \
    --driver-java-options "-Dlog4j.debug=true -Dlog4j.configuration=log4j.properties" \
    --conf "spark.executor.extraJavaOptions=-Dlog4j.debug=true -Dlog4j.configuration=log4j.properties " \
    ./target/X-1.0.jar "$@"

Where log4j.properties is a project file inside src/main/resources/config folder.

I can see in the console:

log4j: Trying to find [config/log4j.properties] using context 
classloader org.apache.spark.util.MutableURLClassLoader@5bb21b69.
log4j: Using URL [jar:file:/home/hdfs/djobi/latest/lib/djobi-1.0.jar!/config/log4j.properties] for automatic log4j configuration.
log4j: Reading configuration from URL jar:file:/home/hdfs/djobi/latest/lib/djobi-1.0.jar!/config/log4j.properties

So the file is taken in account, you can check on Spark webUI too.

Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124
1

Alternatively, you can use PropertyConfigurator of log4j to define your custom log properties.

Ex.

 import com.foo.Bar;

 import org.apache.log4j.Logger;
 import org.apache.log4j.PropertyConfigurator;

 public class MySparkApp {

   static Logger logger = Logger.getLogger(MySparkApp.class.getName());

   public static void main(String[] args) {

     // Location to property file
     PropertyConfigurator.configure(args[0]);

     logger.info("Entering application.");

     logger.info("Exiting application.");
   }
 }

Your properties file shall have the following props,

log4j.appender.file=org.apache.log4j.FileAppender

log4j.appender.file.File=/tmp/application.log

log4j.appender.file.append=false

log4j.appender.file.layout=org.apache.log4j.PatternLayout

log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

EDIT: Updating link to log4j docs. Spark uses log4j 2, not v1.2

Ref : http://logging.apache.org/log4j/2.x/

EmmaOnThursday
  • 167
  • 4
  • 9
  • I don't know if this will work on both executor and driver. In the above code, it will probably code configured only for the driver. – panther Apr 25 '17 at 22:58
1

In you log4j.properties file, you should also modify the log4j.rootCategory from INFO,console to INFO,file.

log4j.rootCategory=INFO, console    
log4j.rootCategory=INFO,file
Vojtech Ruzicka
  • 16,384
  • 15
  • 63
  • 66
Bing
  • 19
  • 3