Syntax error on topology.py when I try to run scala command in spark through Cloudera VM

Question

Everytime I try to run following Scala command

val dataRDD =  sc.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/data/data.txt")
    dataRDD.collect().foreach(println)
    //or
    dataRDD.count()

I get following exception -

exitCodeException exitCode=1:   File "/etc/hadoop/conf.cloudera.yarn/topology.py", line 43 print default_rack^
SyntaxError: Missing parentheses in call to 'print'

-I am running Spark 1.6.0 on Cloudera VM. Anyone else faced such issue? What can be the reason? I understand that this is due to the 'topology.py' file which is trying to print without "(" which is required on python 3. But Why is this script being excuted when I am not running python/pyspark. This is only happening through Cloudera VM, when I run outside the vm with some other sample data, the commands work!

Also seeing this. No answer yet, though. – Oct 13 '16 at 19:38 — , Oct 13 '16 at 19:38

score 1 · Answer 1 · answered Jan 10 '18 at 02:45

I know it might be too late but I am posting the answer any way in case any other user face the same issue.

Above is the known issue and the workaround is following:

Workaround: Add a YARN gateway role to each host that does not already have at least one YARN role (of any type). YARN gateway needs to be added on the node/host where you are facing this issue.

Syntax error on topology.py when I try to run scala command in spark through Cloudera VM

1 Answers1