0

I would like to initiate the spark context in python from scala.

I have added package 'pyspark' to do this. This is the code which I have tried and this works fine.

Code snippet:

import sys.process._

var os: java.io.OutputStream = _
val python = Process(Seq("python","-i")).run(BasicIO.standard(os = _))

def pushLine(s: String): Unit = {
  os.write(s"$s\n".getBytes("UTF-8"))
  os.flush()
}

pushLine("from pyspark import SparkContext, SparkConf;from pyspark.sql import SQLContext;conf = SparkConf().setAppName('test').setMaster('local');sc = SparkContext(conf=conf);sqlContext = SQLContext(sc);")

Now, my requirement is to avoid the output stream that gets displayed in scala. Is there any option to avoid this ?

Spark Context Initialization

Thanks in advance :)

Ramkumar
  • 444
  • 1
  • 7
  • 22

1 Answers1

1

the below method worked for me.

  1. create a file log4j.properties in some directory say /home/vijay/py-test-log

    log4j.rootCategory=WARN, console
    log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

    log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

  2. cd /home/vijay/py-test-log // log4j.props file should be here

  3. then lauch pyspark from this directory in which u have log4j.properties

    $pwd
    /home/vijay/py-test-log
    $/usr/lib/spark-1.2.0-bin-hadoop2.3/bin/pyspark

  4. That all done. pyspark will load the log4j.props file from where it is launched.
vijay kumar
  • 2,049
  • 1
  • 15
  • 18