5

I have a spark 1.2.0 on a cdh 5.3 cluster.

I managed to make my spark application log to local file system thanks to a custom log4j.properties file bundled inside the jar. This is fine until when spark is launched in yarn-client mode, but is not feasible in yarn-cluster mode since there will be no way to know on which machine the driver is running.

I gave a look to the yarn logs aggregator looking at the file which is produced in hdfs://nameservice1/user/spark/applicationHistory/application_1444387971657_0470/* and this does not match at all with the file on the normal filesystem but is information like this

{"Event":"SparkListenerTaskEnd","Stage ID":1314,"Stage Attempt ID":0,"Task Type":"ResultTask","Task End Reason":{"Reason":"Success"},"Task Info":{"Task ID":3120,"Index":1,"Attempt":0,"Launch Time":1445512311024,"Executor ID":"3","Host":"usqrtpl5328.internal.unicreditgroup.eu","Locality":"RACK_LOCAL","Speculative":false,"Getting Result Time":0,"Finish Time":1445512311685,"Failed":false,"Accumulables":[]},"Task Metrics":{"Host Name":"usqrtpl5328.internal.unicreditgroup.eu","Executor Deserialize Time":5,"Executor Run Time":652,"Result Size":1768,"JVM GC Time":243,"Result Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle Read Metrics":{"Remote Blocks Fetched":26,"Local Blocks Fetched":10,"Fetch Wait Time":0,"Remote Bytes Read":16224},"Output Metrics":{"Data Write Method":"Hadoop","Bytes Written":82983}}}

Now is there a way to log all and just what I want to HDFS?

Any suggestion welcome

EDIT I had seen this question when I posted mine. It does not solve my problems since I need to log to HDFS and that is not taken into consideration.

I do not even know if it is possible to directly log with log4j to HDFS, if you have any idea on how to write the log4j.properties accordingly, please share

Community
  • 1
  • 1
Irene
  • 744
  • 1
  • 12
  • 36

0 Answers0