java.io.IOException: No FileSystem for scheme: maprfs. Adding the maprfs jar to bash_profile didn't work

Question

I am getting the below error when running the following command through spark-shell. I have also added the maprfs jar in my bash_profile as shown below.I tried most of the solutions from similar posts, but unable to fix this.

scala> val input = sc.textFile("maprfs:///user/uber/list/brand.txt")
input: org.apache.spark.rdd.RDD[String] = maprfs:///user/uber/list/brand.txt MapPartitionsRDD[1] at textFile at <console>:24

scala> input.count()
java.io.IOException: No FileSystem for scheme: maprfs
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
  at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
  at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
  at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
  at org.apache.spark.rdd.RDD.count(RDD.scala:1168)
  ... 49 elided

bash_profile:

export MAPR_HOME=/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprfs-5.1.0-mapr.jar export PATH=$MAPR_HOME:$PATH

score 0 · Answer 1 · answered Mar 19 '19 at 10:09

0

If you look at the Spark architecture, you will see that you have drivers and executors. When you set an environment like you did, it will affect your driver, not the executor.

Look at this question. This should help you.

answered Mar 19 '19 at 10:09

jgp

2,069
1
21
40

I don't think that they will understand which jars they are missing from that answer. – Ted Dunning Apr 04 '19 at 16:59

score 0 · Answer 2 · answered Apr 04 '19 at 16:58

This looks like you are using a version of Spark that doesn't have the various MapR jars in the class path. It is very hard to tell since you don't provide any information about which version of software you are using.

Have you tried with the MapR supplied version?

java.io.IOException: No FileSystem for scheme: maprfs. Adding the maprfs jar to bash_profile didn't work

2 Answers2