hadoop streaming: where are application logs?

Question

My question is similar to : hadoop streaming: how to see application logs? (The link in the answer is not currently working. So I have to post it again with an additional question)

I can see all hadoop logs on my /usr/local/hadoop/logs path

but where can I see application level logs? for example :

reducer.py -

import logging
....
logging.basicConfig(level=logging.ERROR, format='MAP %(asctime)s%(levelname)s%(message)s')
logging.error('Test!')  
...

I am not able to see any of the logs (WARNING,ERROR) in stderr.

Where I can find my log statements of the application? I am using Python and using hadoop-streaming.

Additional question :

If I want to use a file to store/aggregate my application logs like :

reducer.py -

....
logger = logging.getLogger('test')
hdlr = logging.FileHandler(os.environ['HOME']+'/test.log')
formatter = logging.Formatter('MAP %(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.ERROR)
logger.error('please work!!')
.....

(Assuming that I have test.log in $HOME location of master & all slaves in my hadoop cluster). Can I achieve this in a distributed environment like Hadoop? If so, how can achieve this?

I tried this and ran a sample streaming job, but to only see the below error :

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:330)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:543)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:484)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Please help me understand how logging can be achieved in hadoop streaming jobs.

Thank you

possible duplicate of http://stackoverflow.com/questions/7894770/hadoop-streaming-how-to-see-application-logs — Pradeep Gollakota, Jun 02 '15 at 03:43
I know it is a repeated question & I mentioned in the first line of my question too. But the link in the answer is broken & also I am still not able to see warning/error logs in stderr when i used the pseudo code mentioned above. Also I have an additional question regarding aggregating logs in a file. — annunarcist, Jun 02 '15 at 03:58

Bartosz Kotwica · Answer 1 · 2015-06-10T09:41:41.887

1

Try this HDFS path: /yarn/apps/&{user_name}/logs/application_${appid}/

in general:

Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.

If you print to stderr you'll find it in files under this dir I mentioned above. There should be one file per one node.

edited Jun 10 '15 at 09:41

answered Jun 10 '15 at 09:29

Bartosz Kotwica

41
5

Thank you. I recently figured this out and partly answers my question that now I can see the log statements in stdout and stderr. – annunarcist Jun 10 '15 at 19:04

score 0 · Answer 2 · answered Jun 02 '15 at 03:18

0

You must be aware that Hadoop-streaming uses stdout to pipe data from mappers to reducers. So if your logging system writes in stdout, you will be in trouble, since it will very likely break your logic and your job. One way to log is to write in stderr, thus you will see your logs in errors logs.

answered Jun 02 '15 at 03:18

Yann

361
2
7

I tried ERROR & WARNING logs too, but still not able to see them in stderr too. My bad, I will edit the question to avoid confusion. – annunarcist Jun 02 '15 at 03:28
I don't know Python, but are you sure that you are writing in stderr? – Yann Jun 02 '15 at 03:32
Yes, I used - `logging.basicConfig(level=logging.ERROR, format='MAP %(asctime)s%(levelname)s%(message)s') logging.error('Test!')` code snippet in my python code and it worked fine if it were a normal python script. I could see log statement in stderr in that case. But when I used the same code snippet in hadoop reducer.py, it didnt work. stderr is empty. – annunarcist Jun 02 '15 at 04:02

hadoop streaming: where are application logs?

2 Answers2

Linked