3

NOTE: I don't want to specify a YARN-queue name as in Hadoop: specify yarn queue for distcp


I frequently use hadoop distcp for moving data around HDFS and would like to have a descriptive application name for these jobs.


Presently all copying jobs just appear with the name "distcp" on Resource Manager UI and there's no way to distinguish between different jobs.

enter image description here


Is there a way to improve it?

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131

1 Answers1

6

Like many other MR tools, hadoop distcp also allows you to pass mapred properties using

-Dmapred.property.name=property-value


so when I use

hadoop distcp \
  -Dmapred.job.name=billing_db.replicate \
  -m 10 \
  /user/hive/warehouse/billing_db.db/ \
  s3a://my-s3-bucket/billing_db.db/

it appears nicely on Resource Manager UI

enter image description here


References

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131