4

If you try this:

spark-submit \
  --packages "org.apache.hadoop:hadoop-aws:2.7.4" \
  pyspark-example.py

You will get a large amount of noise output as spark-submit resolves all the dependencies of the hadoop-aws package and downloads them. You get slightly less output if the package is already downloaded, but it's still a lot:

org.apache.hadoop:hadoop-aws:2.7.4 pyspark-example.py
Ivy Default Cache set to: /home/ec2-user/.ivy2/cache
The jars for the packages stored in: /home/ec2-user/.ivy2/jars
:: loading settings :: url = jar:file:/hadoop/spark/spark-2.2.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found org.apache.hadoop#hadoop-aws;2.7.4 in central
    found org.apache.hadoop#hadoop-common;2.7.4 in central
    found org.apache.hadoop#hadoop-annotations;2.7.4 in central
    found com.google.guava#guava;11.0.2 in central
    found com.google.code.findbugs#jsr305;3.0.0 in central
    found commons-cli#commons-cli;1.2 in central
    found org.apache.commons#commons-math3;3.1.1 in central
    found xmlenc#xmlenc;0.52 in central
    found commons-httpclient#commons-httpclient;3.1 in central
    found commons-logging#commons-logging;1.1.3 in central
    found commons-codec#commons-codec;1.4 in central
    found commons-io#commons-io;2.4 in central
    found commons-net#commons-net;3.1 in central
    found commons-collections#commons-collections;3.2.2 in central
    found javax.servlet#servlet-api;2.5 in central
    found org.mortbay.jetty#jetty;6.1.26 in central
    found org.mortbay.jetty#jetty-util;6.1.26 in central
    found org.mortbay.jetty#jetty-sslengine;6.1.26 in central
    found com.sun.jersey#jersey-core;1.9 in central
    found com.sun.jersey#jersey-json;1.9 in central
    found org.codehaus.jettison#jettison;1.1 in central
    found com.sun.xml.bind#jaxb-impl;2.2.3-1 in central
    found javax.xml.bind#jaxb-api;2.2.2 in central
    found javax.xml.stream#stax-api;1.0-2 in central
    found javax.activation#activation;1.1 in central
    found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
    found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
    found org.codehaus.jackson#jackson-jaxrs;1.9.13 in central
    found org.codehaus.jackson#jackson-xc;1.9.13 in central
    found com.sun.jersey#jersey-server;1.9 in central
    found asm#asm;3.2 in central
    found log4j#log4j;1.2.17 in central
    found net.java.dev.jets3t#jets3t;0.9.0 in central
    found org.apache.httpcomponents#httpclient;4.2.5 in central
    found org.apache.httpcomponents#httpcore;4.2.5 in central
    found com.jamesmurty.utils#java-xmlbuilder;0.4 in central
    found commons-lang#commons-lang;2.6 in central
    found commons-configuration#commons-configuration;1.6 in central
    found commons-digester#commons-digester;1.8 in central
    found commons-beanutils#commons-beanutils;1.7.0 in central
    found commons-beanutils#commons-beanutils-core;1.8.0 in central
    found org.slf4j#slf4j-api;1.7.10 in central
    found org.apache.avro#avro;1.7.4 in central
    found com.thoughtworks.paranamer#paranamer;2.3 in central
    found org.xerial.snappy#snappy-java;1.0.4.1 in central
    found org.apache.commons#commons-compress;1.4.1 in central
    found org.tukaani#xz;1.0 in central
    found com.google.protobuf#protobuf-java;2.5.0 in central
    found com.google.code.gson#gson;2.2.4 in central
    found org.apache.hadoop#hadoop-auth;2.7.4 in central
    found org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 in central
    found org.apache.directory.server#apacheds-i18n;2.0.0-M15 in central
    found org.apache.directory.api#api-asn1-api;1.0.0-M20 in central
    found org.apache.directory.api#api-util;1.0.0-M20 in central
    found org.apache.zookeeper#zookeeper;3.4.6 in central
    found org.slf4j#slf4j-log4j12;1.7.10 in central
    found io.netty#netty;3.6.2.Final in central
    found org.apache.curator#curator-framework;2.7.1 in central
    found org.apache.curator#curator-client;2.7.1 in central
    found com.jcraft#jsch;0.1.54 in central
    found org.apache.curator#curator-recipes;2.7.1 in central
    found org.apache.htrace#htrace-core;3.1.0-incubating in central
    found org.mortbay.jetty#servlet-api;2.5-20081211 in central
    found javax.servlet.jsp#jsp-api;2.1 in central
    found jline#jline;0.9.94 in central
    found junit#junit;4.11 in central
    found org.hamcrest#hamcrest-core;1.3 in central
    found com.fasterxml.jackson.core#jackson-databind;2.2.3 in central
    found com.fasterxml.jackson.core#jackson-annotations;2.2.3 in central
    found com.fasterxml.jackson.core#jackson-core;2.2.3 in central
    found com.amazonaws#aws-java-sdk;1.7.4 in central
    found joda-time#joda-time;2.9.9 in central
    [2.9.9] joda-time#joda-time;[2.2,)
:: resolution report :: resolve 2170ms :: artifacts dl 65ms
    :: modules in use:
    asm#asm;3.2 from central in [default]
    com.amazonaws#aws-java-sdk;1.7.4 from central in [default]
    com.fasterxml.jackson.core#jackson-annotations;2.2.3 from central in [default]
    com.fasterxml.jackson.core#jackson-core;2.2.3 from central in [default]
    com.fasterxml.jackson.core#jackson-databind;2.2.3 from central in [default]
    com.google.code.findbugs#jsr305;3.0.0 from central in [default]
    com.google.code.gson#gson;2.2.4 from central in [default]
    com.google.guava#guava;11.0.2 from central in [default]
    com.google.protobuf#protobuf-java;2.5.0 from central in [default]
    com.jamesmurty.utils#java-xmlbuilder;0.4 from central in [default]
    com.jcraft#jsch;0.1.54 from central in [default]
    com.sun.jersey#jersey-core;1.9 from central in [default]
    com.sun.jersey#jersey-json;1.9 from central in [default]
    com.sun.jersey#jersey-server;1.9 from central in [default]
    com.sun.xml.bind#jaxb-impl;2.2.3-1 from central in [default]
    com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
    commons-beanutils#commons-beanutils;1.7.0 from central in [default]
    commons-beanutils#commons-beanutils-core;1.8.0 from central in [default]
    commons-cli#commons-cli;1.2 from central in [default]
    commons-codec#commons-codec;1.4 from central in [default]
    commons-collections#commons-collections;3.2.2 from central in [default]
    commons-configuration#commons-configuration;1.6 from central in [default]
    commons-digester#commons-digester;1.8 from central in [default]
    commons-httpclient#commons-httpclient;3.1 from central in [default]
    commons-io#commons-io;2.4 from central in [default]
    commons-lang#commons-lang;2.6 from central in [default]
    commons-logging#commons-logging;1.1.3 from central in [default]
    commons-net#commons-net;3.1 from central in [default]
    io.netty#netty;3.6.2.Final from central in [default]
    javax.activation#activation;1.1 from central in [default]
    javax.servlet#servlet-api;2.5 from central in [default]
    javax.servlet.jsp#jsp-api;2.1 from central in [default]
    javax.xml.bind#jaxb-api;2.2.2 from central in [default]
    javax.xml.stream#stax-api;1.0-2 from central in [default]
    jline#jline;0.9.94 from central in [default]
    joda-time#joda-time;2.9.9 from central in [default]
    junit#junit;4.11 from central in [default]
    log4j#log4j;1.2.17 from central in [default]
    net.java.dev.jets3t#jets3t;0.9.0 from central in [default]
    org.apache.avro#avro;1.7.4 from central in [default]
    org.apache.commons#commons-compress;1.4.1 from central in [default]
    org.apache.commons#commons-math3;3.1.1 from central in [default]
    org.apache.curator#curator-client;2.7.1 from central in [default]
    org.apache.curator#curator-framework;2.7.1 from central in [default]
    org.apache.curator#curator-recipes;2.7.1 from central in [default]
    org.apache.directory.api#api-asn1-api;1.0.0-M20 from central in [default]
    org.apache.directory.api#api-util;1.0.0-M20 from central in [default]
    org.apache.directory.server#apacheds-i18n;2.0.0-M15 from central in [default]
    org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 from central in [default]
    org.apache.hadoop#hadoop-annotations;2.7.4 from central in [default]
    org.apache.hadoop#hadoop-auth;2.7.4 from central in [default]
    org.apache.hadoop#hadoop-aws;2.7.4 from central in [default]
    org.apache.hadoop#hadoop-common;2.7.4 from central in [default]
    org.apache.htrace#htrace-core;3.1.0-incubating from central in [default]
    org.apache.httpcomponents#httpclient;4.2.5 from central in [default]
    org.apache.httpcomponents#httpcore;4.2.5 from central in [default]
    org.apache.zookeeper#zookeeper;3.4.6 from central in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-jaxrs;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-xc;1.9.13 from central in [default]
    org.codehaus.jettison#jettison;1.1 from central in [default]
    org.hamcrest#hamcrest-core;1.3 from central in [default]
    org.mortbay.jetty#jetty;6.1.26 from central in [default]
    org.mortbay.jetty#jetty-sslengine;6.1.26 from central in [default]
    org.mortbay.jetty#jetty-util;6.1.26 from central in [default]
    org.mortbay.jetty#servlet-api;2.5-20081211 from central in [default]
    org.slf4j#slf4j-api;1.7.10 from central in [default]
    org.slf4j#slf4j-log4j12;1.7.10 from central in [default]
    org.tukaani#xz;1.0 from central in [default]
    org.xerial.snappy#snappy-java;1.0.4.1 from central in [default]
    xmlenc#xmlenc;0.52 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   72  |   1   |   0   |   0   ||   72  |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    0 artifacts copied, 72 already retrieved (0kB/17ms)

hadoop-aws is a relatively common package that enables Spark to interact with S3 via S3A. Every time someone runs spark-submit with that package, they are greeted with the above wall of text.

Is there a way to quiet all this output unless there is a problem? The solutions discussed here, like setting log4j.rootCategory=ERROR, don't seem to affect the above output.

Nick Chammas
  • 11,843
  • 8
  • 56
  • 115
  • Did you changed `log4j.rootCategory=WARN` to `log4j.rootCategory=ERROR` in `log4j.properties` file. – koiralo Feb 27 '18 at 03:02
  • 1
    @ShankarKoirala - Yes. It seems to have no impact on the above output. – Nick Chammas Feb 27 '18 at 03:20
  • Perhaps this is a setting that needs to be specified in Apache Ivy, which appears to be the thing responsible for resolving the package dependencies. – Nick Chammas Feb 27 '18 at 03:25
  • 1
    Yes, it's ivy's output.If you were running ivy only I'd tell you call with `-warn` (but I don't know how spark calls ivy) http://ant.apache.org/ivy/history/latest-milestone/standalone.html – cantSleepNow Mar 02 '18 at 17:24
  • @cantSleepNow - It looks like there is a way to pass an Ivy settings XML file through spark-submit via the `spark.jars.ivySettings` option. Would you happen to know how to get the equivalent of `-warn` via Ivy's XML settings file? I think that would solve the problem. – Nick Chammas Mar 07 '18 at 16:30
  • Unfortunately this is not set in ivysettings.xml. I think if I echo all ivy vars in ant, it would be value of one of those. But even with that, it would be hard to pass it in automatically. Is there a src line in spark where ivy is instantiated? – cantSleepNow Mar 07 '18 at 16:32
  • @cantSleepNow - I'm not familiar with this part of Spark's source, but it looks like [Spark instantiates Ivy via a Java/Scala API](https://github.com/apache/spark/blob/6cff7d19f6a905fe425bd6892fe7ca014c0e696b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1230), as opposed to via the command line. It uses the provided Ivy settings as part of the instantiation. – Nick Chammas Mar 07 '18 at 19:45
  • 1
    If this is the only place where ivy is used, then the following should work: `org.apache.ivy.util.Message.setDefaultLogger(new org.apache.ivy.util.DefaultMessageLogger(org.apache.ivy.util.Message.Message.MSG_WARN));` Just write it before line where ivy is instantiated. Of course you can write imports not to have the line so long – cantSleepNow Mar 07 '18 at 20:14
  • Did it work at the end? – cantSleepNow Mar 08 '18 at 16:04
  • @cantSleepNow - I haven't tried it yet since it requires I build my own version of Spark. However, if you post your comment as an answer I'd be happy to upvote it until I do get around to trying this out. – Nick Chammas Mar 08 '18 at 23:43

2 Answers2

2

Extracting from comments:

Since Spark uses Ivy API, it's should be possible to change the default logger by calling the following before Ivy is instantiated

org.apache.ivy.util.Message.setDefaultLogger(new org.apache.ivy.util.DefaultMessageLogger(org.apache.ivy.util.Message.Message.MSG_WARN));

I used warn here but it can be any of the message levels.

cantSleepNow
  • 9,691
  • 5
  • 31
  • 42
1

After messed around with Ivy configs, I didn't solve this problem, at least when submitting Python scripts through spark-submit, because PySpark script only have access to JVM after SparkContext is initialized, which is too late. But I've got a workaround for those who may need it.

That is, simply delete the corresponding lines by piping them to sed -u 'X,Yd' (delete the first X line to Y line, -u for unbuffered output).

In OP's case, we shall delete the 2~171 line from spark-submit output.

spark-submit \
  --packages "org.apache.hadoop:hadoop-aws:2.7.4" \
  pyspark-example.py | sed -u '2,171d'

to find the line number, just copy-paste the Ivy output and count the line number with text editor or wc -l.

ttimasdf
  • 1,367
  • 1
  • 12
  • 19