1

I am trying to copy a file from S3 to my hadoop HDFS on Amazon EC2.

The command that I am using is:

bin/hadoop distcp s3://<awsAccessKeyId>:<awsSecretAccessKey>@<bucket_name>/f1 hdfs://user/root/
  • f1 is the name of the file
  • I have also changed it to s3n to see if it works but it does not.
  • I replace the forward slash in my secret access key with %2F

Error that I get is:SignatureDoesNotMatch

org.jets3t.service.S3ServiceException: S3 GET failed for '/%2Ff1'

<Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>

<StringToSignBytes>...</StringToSignBytes>

<RequestId>...</RequestId>

<HostId>..</HostId>

<SignatureProvided>NsefW5en6P728cc9llkFIk6yGc4=\
    </SignatureProvided>

<StringToSign>GETMon, 05 Aug 2013 15:28:21 GMT/<bucket_name>/%2Ff1</StringToSign>

<AWSAccessKeyId><MY_ACCESS_ID><\ /AWSAccessKeyId></Error>

I have only one AWS Access Key Id and secret Key. I checked my AWS account and they are the same. I use the same AWS Access Key and secret Key to log on to my EC2 cluster. I have also tried using core-site.xml but that has not helped either.

Thanks, Rajiv

RAbraham
  • 5,956
  • 8
  • 45
  • 80
  • I also found that you would need your s3 url to start with s3n instead of s3 – viper Oct 30 '13 at 16:22
  • this worked for me: http://stackoverflow.com/questions/14681938/invalid-hostname-error-when-connecting-to-s3-sink-when-using-secret-key-having-f – Roshini Jul 20 '16 at 13:53

2 Answers2

1

Regenerating my AWS Key and Secret such that there is no forward slash in my secret worked for me. Ref: https://issues.apache.org/jira/browse/HADOOP-3733

RAbraham
  • 5,956
  • 8
  • 45
  • 80
1

An alternative to regenerating the key that worked for me was to use -Dfs.s3n.awsAccessKeyId= -Dfs.s3n.awsSecretAccessKey= flags when running distcp.

Example: hadoop distcp -Dfs.s3n.awsAccessKeyId= -Dfs.s3n.awsSecretAccessKey= s3n://path/to/log/dir hdfs://hdfs-node:8020/logs/

Note the use of s3n, which has a 5GB file limitation: Difference between Amazon S3 and S3n in Hadoop

Edit: Do not url-encode the secret access key, so slashes "/" and pluses "+" should be passed as they are!

Community
  • 1
  • 1
Garren S
  • 5,552
  • 3
  • 30
  • 45
  • The last solution won't work, the hadoop file system still complaints: java.lang.IllegalArgumentException: Wrong FS – tribbloid Feb 18 '15 at 21:34