0

I am getting the below error when running the following command through spark-shell. I have also added the maprfs jar in my bash_profile as shown below.I tried most of the solutions from similar posts, but unable to fix this.

scala> val input = sc.textFile("maprfs:///user/uber/list/brand.txt")
input: org.apache.spark.rdd.RDD[String] = maprfs:///user/uber/list/brand.txt MapPartitionsRDD[1] at textFile at <console>:24

scala> input.count()
java.io.IOException: No FileSystem for scheme: maprfs
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
  at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
  at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
  at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
  at org.apache.spark.rdd.RDD.count(RDD.scala:1168)
  ... 49 elided

bash_profile:

export MAPR_HOME=/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprfs-5.1.0-mapr.jar export PATH=$MAPR_HOME:$PATH

Dookoto_Sea
  • 521
  • 1
  • 5
  • 16

2 Answers2

0

If you look at the Spark architecture, you will see that you have drivers and executors. When you set an environment like you did, it will affect your driver, not the executor.

Look at this question. This should help you.

jgp
  • 2,069
  • 1
  • 21
  • 40
0

This looks like you are using a version of Spark that doesn't have the various MapR jars in the class path. It is very hard to tell since you don't provide any information about which version of software you are using.

Have you tried with the MapR supplied version?

Ted Dunning
  • 1,877
  • 15
  • 12