1

I am trying to load parquet file from in EMR spark-shell.

Command:

// to start spark
spark-shell  --driver-class-path postgresql-42.2.5.jar --jars postgresql-42.2.5.jar 
// to read data
spark.read.parquet("s3://file_location/").write.saveAsTable("table_name")

Error:

19/04/11 10:08:41 WARN FileStreamSink: Error while looking for metadata directory.
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2702)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2715)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2751)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2733)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:377)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:348)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:348)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:559)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:543)
  ... 48 elided
Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
  at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
  at org.apac
  • when I am running spark-shell without postgresql jar, i am able to load data from s3.

Any help would be appreciated.

SCouto
  • 7,808
  • 5
  • 32
  • 49
bob
  • 4,595
  • 2
  • 25
  • 35

0 Answers0