logging the output of a map reduce job to a text file

Question

I've been using this jobclient.monitorandprintjob() method to print the output of a map reduce job to the console. My usage is something like this:

job_client.monitorAndPrintJob(job_conf, job_client.getJob(j.getAssignedJobID()))

The output of which is as follows (printed on the console):

13/03/04 07:20:00 INFO mapred.JobClient: Running job: job_201302211725_10139<br>
13/03/04 07:20:01 INFO mapred.JobClient:  map 0% reduce 0%<br>
13/03/04 07:20:08 INFO mapred.JobClient:  map 100% reduce 0%<br>
13/03/04 07:20:13 INFO mapred.JobClient:  map 100% reduce 100%<br>
13/03/04 07:20:13 INFO mapred.JobClient: Job complete: job_201302211725_10139<br>
13/03/04 07:20:13 INFO mapred.JobClient: Counters: 26<br>
13/03/04 07:20:13 INFO mapred.JobClient:   Job Counters<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Launched reduce tasks=1<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Aggregate execution time of mappers(ms)=5539<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Launched map tasks=2<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Data-local map tasks=2<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Aggregate execution time of reducers(ms)=4337<br>
13/03/04 07:20:13 INFO mapred.JobClient:   FileSystemCounters<br>
13/03/04 07:20:13 INFO mapred.JobClient:     MAPRFS_BYTES_READ=583<br>
13/03/04 07:20:13 INFO mapred.JobClient:     MAPRFS_BYTES_WRITTEN=394<br>
13/03/04 07:20:13 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=140219<br>
13/03/04 07:20:13 INFO mapred.JobClient:   Map-Reduce Framework<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Map input records=6<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Reduce shuffle bytes=136<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Spilled Records=22<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Map output bytes=116<br>
13/03/04 07:20:13 INFO mapred.JobClient:     CPU_MILLISECONDS=1320<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Map input bytes=64<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Combine input records=13<br>
13/03/04 07:20:13 INFO mapred.JobClient:     SPLIT_RAW_BYTES=180<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Reduce input records=11<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Reduce input groups=11<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Combine output records=11<br>
13/03/04 07:20:13 INFO mapred.JobClient:     PHYSICAL_MEMORY_BYTES=734961664<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Reduce output records=11<br>
13/03/04 07:20:13 INFO mapred.JobClient:     VIRTUAL_MEMORY_BYTES=9751805952<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Map output records=13<br>
13/03/04 07:20:13 INFO mapred.JobClient:     GC time elapsed (ms)=0<br>

I would like the above output/log to be printed in a text file, rather than the console. any suggestions?

score 2 · Accepted Answer · answered Mar 04 '13 at 15:20

2

In your HADOOP_HOME/conf you may find one file named : log4j.properties. I believe you can configure where and how to log in there.

To be precise, you shall be using a rolling file appender, so you shall un-comment(just remove #) the following lines from log4j.properties file:

# Rolling File Appender
#

#log4j.appender.RFA=org.apache.log4j.RollingFileAppender
#log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}

# Logfile size and and 30-day backups
#log4j.appender.RFA.MaxFileSize=1MB
#log4j.appender.RFA.MaxBackupIndex=30

#log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

And customize the other parameters to your liking.

For more about log4j configurations, read here.

answered Mar 04 '13 at 15:20

Amar

11,930
5
50
73

thanks for your response @amar. However, i do not have access to the hadoop conf files at the moment. can i use log4j apis or can i programatically transfer the logs into a text file rather than modifying the properties file? – Eashwar Mar 04 '13 at 15:38
where are you executing hadoop from? – Amar Mar 04 '13 at 15:42
you may even have your own properties file and point to that from command line while running hadoop : `-Dlog4j.configuration=my-log4j.properties` – Amar Mar 04 '13 at 15:46
i would try the solution suggested by you. but is it also possible programatically? – Eashwar Mar 04 '13 at 15:55
yes I think so, it should be possible programmatically too. look at this answer: http://stackoverflow.com/questions/14191070/controlling-logging-functionality-in-hadoop – Amar Mar 04 '13 at 16:03
thanks again @amar, but i'm using apache commons logging. how do i change the configurations in this case programatically? – Eashwar Mar 05 '13 at 11:57

logging the output of a map reduce job to a text file

1 Answers1

Linked