Logging on Hadoop

Question

I am trying to run map reduce job. But I am unable to find my log files when I run this job. I am using hadoop streaming job to perform map reduce and I am using Python. I am using python's logging module to log messages. When I run this on a file by using "cat" command, the log file is created.

cat file | ./mapper.py

But when I run this job via hadoop, I am unable to find the log file.

import os,logging

logging.basicConfig(filename="myApp.log", level=logging.INFO)
logging.info("app start")

##
##logic with log messages
##

logging.info("app complete")

But I cannot find the myApp.log file anywhere. Is the log data stored anywhere or does hadoop ignore the application logging complete. I have searched for my log items in the userlogs folder too, but it doesn't look like my log items are there.

I work with vast amounts of data where random items are not making to the next stage, this is a very big issue on our side, so I am trying to find a way to use logging to debug my application.

Any help is appreciated.

Can you try adding a standard output handler to your logger as described here? http://stackoverflow.com/questions/14058453/making-python-loggers-output-all-messages-to-stdout-in-addition-to-log — jaynp, Apr 15 '14 at 02:16

score 1 · Answer 1 · answered Mar 06 '15 at 22:37

I believe that you are logging in stdout? If so, you should definitely log in stderr instead, or create your own custom stream.

Using hadoop-streaming, stdout is the stream dedicated to pass key-values between mappers/reducers and to output results, so you should not log anything in it.

Logging on Hadoop

1 Answers1