I understand that distcp is used for inter/intra cluster transfer of data. Is it possible to use distcp to ingest data from the local file system to HDFS. I understand that you can use file:///.... to point to a local file outside of HDFS but how reliable and fast is that compared to the inter/intra cluster transfer.
Asked
Active
Viewed 574 times
4
-
2No. Distcp can be use only for transferring data with source and sink as HDFS. – Amal G Jose Dec 01 '14 at 19:02
1 Answers
2
Distcp is a mapreduce job that is executed inside the hadoop cluster. For hadoop cluster perspective, your local machine is not a local file system. Then you can't use your local file sytem with distcp. An alternative could be configure a FTP server in your machine that hadoop cluster can read. The performance depends on the network and the protocol used (ftp with hadoop has a very bad performance).
Use hdfs dfs -put command could be better for small amount of data but it isn't work in parallel like distcp.

RojoSam
- 1,476
- 12
- 15