Running Custom JAR on Amazon EMR giving error ( Filesystem Error ) using Amazon S3 Bucket input and output

Question

I am trying to run a Custom JAR on Amazon EMR cluster using the input and output parameters of the Custom JAR as S3 buckets (-input s3n://s3_bucket_name/ldas/in -output s3n://s3_bucket_name/ldas/out)

When the cluster runs this Custom JAR, the following exception occurs.

Exception in thread "main" java.lang.IllegalArgumentException: **Wrong FS: s3n://s3_bucket_name/ldas/out, expected: hdfs://10.214.245.187:9000**
    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:181)
    at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:92)
    at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:585)
    at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:581)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:581)
    at cc.mrlda.ParseCorpus.run(ParseCorpus.java:101)
    at cc.mrlda.ParseCorpus.run(ParseCorpus.java:77)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at cc.mrlda.ParseCorpus.main(ParseCorpus.java:727)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

How to correct this error? How to use s3n bucket as the filesystem in Amazon EMR? Also, I think changing the default filesystem to the s3 bucket would be good, but I am not sure how to do it.

score 1 · Answer 1 · answered May 30 '14 at 05:22

I'd suggest checking that you jar is using the same method of processing the parameters as shown here: http://java.dzone.com/articles/running-elastic-mapreduce-job

Specifically,

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

Alternatively, I've had success adding custom script runner steps to copy files from s3 to hadoop or vice-versa. Particularly if you have a few streaming steps in a row it's helpful to keep things on hdfs. You should be able to make a simple bash scripts with something like

hadoop fs -cp s3://s3_bucket_name/ldas/in hdfs:///ldas/in

and

hadoop fs -cp hdfs:///ldas/out s3://s3_bucket_name/ldas/out

Then set your streaming step in between to operate between hdfs:///ldas/in and hdfs:///ldas/out

Running Custom JAR on Amazon EMR giving error ( Filesystem Error ) using Amazon S3 Bucket input and output

1 Answers1