5

I am facing issues with s3-dist-cp command in emr-5.0.0 version. In my application, I need to push some files from hdfs to S3. I am using s3-dist-cp command to achieve this. It was working fine in emr-4.2.0. But its not working in emr-5.0.0. If I run the command manually it works fine. But it fails in my application. I didn't make any change in my application to run it on emr-5.

Do I need to make any change if I need to use emr-5? Has there been any change in way we use s3-dist-cp command in emr-5?

I am using following command:

s3-dist-cp --src /user/hive/warehouse/abc.text --dest s3n://bucket/abc.text
Kristian
  • 21,204
  • 19
  • 101
  • 176
bipulendra
  • 51
  • 1
  • 2
  • 1
    Including the error in your question would sure help... ;-) – Jonathan Kelly Oct 02 '16 at 16:34
  • also, AFAIK `s3n` is deprecated, use `s3://` from now on – Kristian Oct 03 '16 at 15:03
  • You can also always create a support ticket with AWS if you think that there's a problem specific to their environment or changes, assuming that you have a support plan (which is totally worth it, in my opinion, because their support is spectacular). – devinbost Dec 19 '18 at 00:28

3 Answers3

3

s3-dist-cp is only available on the master node(s3-dist-cp.jar).

The following is the location of the application.

/usr/share/aws/emr/s3-dist-cp/

The s3-dist-cp.jar is not available in the slave nodes.
You can login into slave machine and verify it.

So the reason your application failure might be, In new emr you might be using some workflow management tool which deploy the application on slaves and start from there. As s3 s3-dist-cp is not available and it fails.

Work Around First Option

bundle the jar and use following commands

hadoop jar s3-dist-cp.jar --src location --dest location 

Second

Boot Strap the s3-dist-cp.jars on the cluster

You can even run it as java program

loneStar
  • 3,780
  • 23
  • 40
1

First thing, s3n:// is now deprecated, start using s3:// for S3 paths.

Secondly, if you're merely copying a file into S3 from a local file on your cluster, you can use aws s3 cp:

aws s3 cp /user/hive/warehouse/abc.text s3://bucket/abc.text
Kristian
  • 21,204
  • 19
  • 101
  • 176
0

The syntax that you have used for s3-dist-cp is incorrect. Please try again with the command below.

s3-dist-cp --src hdfs:///user/hive/warehouse/abc.text --dest s3n://bucket/abc.text

Let me know if this solves your proble.

Pfunk
  • 85
  • 8