I have a 5TB of data which need to transfer to GCP bucket using some command.
I tried using hadoop discp -m num -strategy dynamic source_path destination_path. It's still getting executed since long.
Is there any alternative to copy large data from HDFS location to GCP bucket using command.
I tried to execute distCp
command on 50GB of data with different number of mappers, I use:
hadoop discp -m num -strategy dynamic source_path destination_path
I have tried with below options:
- with -m 18 -> it took 16 mins
- with -m 22 -> it took 12 mins
- with -m 44 -> it took 18 mins
- with -m 60 -> it took 5 mins 20 sec
- with -m 72 -> it took 5 mins 9 sec
- with -m 80 -> it took 5 mins 7 sec
- with -m 84 -> it took 16 mins 10 sec
- with -m 88 -> it took 11+ mins
Can someone please suggest some alternative to distcp.