My question is similar to : hadoop streaming: how to see application logs? (The link in the answer is not currently working. So I have to post it again with an additional question)
I can see all hadoop logs on my /usr/local/hadoop/logs path
but where can I see application level logs? for example :
reducer.py -
import logging
....
logging.basicConfig(level=logging.ERROR, format='MAP %(asctime)s%(levelname)s%(message)s')
logging.error('Test!')
...
I am not able to see any of the logs (WARNING,ERROR) in stderr.
Where I can find my log statements of the application? I am using Python and using hadoop-streaming.
Additional question :
If I want to use a file to store/aggregate my application logs like :
reducer.py -
....
logger = logging.getLogger('test')
hdlr = logging.FileHandler(os.environ['HOME']+'/test.log')
formatter = logging.Formatter('MAP %(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.ERROR)
logger.error('please work!!')
.....
(Assuming that I have test.log in $HOME location of master & all slaves in my hadoop cluster). Can I achieve this in a distributed environment like Hadoop? If so, how can achieve this?
I tried this and ran a sample streaming job, but to only see the below error :
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:330)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:543)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:484)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Please help me understand how logging can be achieved in hadoop streaming jobs.
Thank you