0

I am looking for solutions to export HDFS directory to remote server (to normal file system). I have sudo/root access to both servers (local and remote). The file size to be exported is around 3TB

  • Can't use hadoop fs -copyToLocal as it writes to local server which is space constrained to duplicate the data.

  • Can't use sshfs to mount remote server folder as it's not available in the server machines and new tools can't be installed due to security policies

  • hadoop distcp supports only sync/copying data to another hdfs cluster?

RaceBase
  • 18,428
  • 47
  • 141
  • 202
  • You could use `curl` on the remote machine to use WebHDFS. Otherwise, you would need to SSH first to the remote, then `copyToLocal` (after downloading Hadoop CLI) – OneCricketeer May 05 '23 at 15:33

1 Answers1

0

If i understand the situation correctly, In this situation You may not be able push the file to remote server. But you can pull the files from Hadoop Cluster to Remote server. By running some pulling script. I think In python snakebite, etc. libraries are available to do so. This library can connect to remote servers also. (In case you are not able to install snakebite in remote machine ) Or you can also use webhdfs APIs to do so. Then make a http calls from scripts using libraries available. As per My knowledge snakebite also under the hood used webhdfs APIs. See, this this. Namenode UI also provides option to download file. Probably you can intercept that and get behaviour of URL patterns do same with your solution.