7

We are trying to setup Cloudera 5.5 where HDFS will be working on s3 only for that we have already configured the necessory properties in Core-site.xml

<property>
    <name>fs.s3a.access.key</name>
    <value>################</value>
</property>
<property>
    <name>fs.s3a.secret.key</name>
    <value>###############</value>
</property>
<property>
    <name>fs.default.name</name>
    <value>s3a://bucket_Name</value>
</property>
<property>
    <name>fs.defaultFS</name>
    <value>s3a://bucket_Name</value>
</property>

After setting it up we were able to browse the files for s3 bucket from command

hadoop fs -ls /

And it shows the files available on s3 only.

But when we start the yarn services JobHistory server fails to start with below error and on launching pig jobs we are getting same error

PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
ERROR   org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils   
Unable to create default file context [s3a://kyvosps]
org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)

On serching on Internet we found that we need to set following properties as well in core-site.xml

<property>
  <name>fs.s3a.impl</name>
  <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  <description>The implementation class of the S3A Filesystem</description>
</property>
<property>
    <name>fs.AbstractFileSystem.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
    <description>The FileSystem for  S3A Filesystem</description>
</property>

After setting the above properties we are getting following error

org.apache.hadoop.service.AbstractService   
Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
    at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:131)
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:157)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
    at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:451)
    at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:473)
    at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.getDefaultFileContext(JobHistoryUtils.java:247)

The jars needed for this is in place but still getting the error any help will be great. Thanks in advance

Update

I tried to remove the property fs.AbstractFileSystem.s3a.impl but it give me the same first exception the one i was getting previously which is

org.apache.hadoop.security.UserGroupInformation 
PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
ERROR   org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils   
Unable to create default file context [s3a://bucket_name]
org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
    at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:451)
    at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:473)
Vikas Hardia
  • 2,635
  • 5
  • 34
  • 53

1 Answers1

6

The problem is not with the location of the jars.

The problem is with the setting:

<property>
    <name>fs.AbstractFileSystem.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
    <description>The FileSystem for  S3A Filesystem</description>
</property>

This setting is not needed. Because of this setting, it is searching for following constructor in S3AFileSystem class and there is no such constructor:

S3AFileSystem(URI theUri, Configuration conf);

Following exception clearly tells that it is unable to find a constructor for S3AFileSystem with URI and Configuration parameters.

java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)

To resolve this problem, remove fs.AbstractFileSystem.s3a.impl setting from core-site.xml. Just having fs.s3a.impl setting in core-site.xml should solve your problem.

EDIT: org.apache.hadoop.fs.s3a.S3AFileSystem just implements FileSystem.

Hence, you cannot set value of fs.AbstractFileSystem.s3a.impl to org.apache.hadoop.fs.s3a.S3AFileSystem, since org.apache.hadoop.fs.s3a.S3AFileSystem does not implement AbstractFileSystem.

I am using Hadoop 2.7.0 and in this version s3A is not exposed as AbstractFileSystem.

There is JIRA ticket: https://issues.apache.org/jira/browse/HADOOP-11262 to implement the same and the fix is available in Hadoop 2.8.0.

Assuming, your jar has exposed s3A as AbstractFileSystem, you need to set the following for fs.AbstractFileSystem.s3a.impl:

<property>
    <name>fs.AbstractFileSystem.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3A</value>
</property>

That will solve your problem.

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
  • I tried that as well this gives me error "PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a" Due to this only i've added the property in the core-stie.xml i will update my question also – Vikas Hardia Dec 17 '15 at 05:06
  • Can you tell me, from where are you picking up the jar for s3a file system? Then, I can definitely solve your problem. I was looking at the implementation here: https://github.com/Aloisius/hadoop-s3a, which just implements FileSystem. – Manjunath Ballur Dec 17 '15 at 05:08
  • jar is in default location of the cloudera which is "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hadoop" the jar name is hadoop-aws-2.6.0-cdh5.5.0.jar – Vikas Hardia Dec 17 '15 at 05:14
  • Thanks for your efforts and time we really appreciate it – Vikas Hardia Dec 17 '15 at 05:17
  • I am using Hadoop 2.6.0 so we need to wait till 2.8.0. Thanks for your help and time – Vikas Hardia Dec 17 '15 at 05:54
  • I tried that as well getting error "Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3A not found" – Vikas Hardia Dec 17 '15 at 06:54
  • So in conclusion we cannot replace s3 with localdisk for HDFS. Only we can use s3 for Input and output for MapReduce right?? Or is there any other way to do that? – Vikas Hardia Dec 17 '15 at 07:36
  • Replace s3 with local disk? I am not clear about this statement. I see that, you are facing an exception when Job History server tries to write to s3. I have used s3 only for MapReduce jobs. Like I mentioned in the answer, I guess YARN support for s3 AbstractFileSystem implementation exists only from Hadoop 2.8.0. – Manjunath Ballur Dec 17 '15 at 08:39
  • what we are trying to achieve is to use s3 instead of local storage and for that only we've overridden the property fs.defaultFS and provided s3 details in place of namenodeip:8020. – Vikas Hardia Dec 17 '15 at 12:03
  • OK. Got it. Let me dig deep into it. I see that, the exception is coming in JH Server: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.(java.net.URI, org.apache.hadoop.conf.Configuration). I will go through this code and check if I can find something more. – Manjunath Ballur Dec 17 '15 at 18:11