2

I'm trying to do lots of joins on some data frames using spark in scala. When I'm trying to get the count of the final data frame I'm generating here, I'm getting the following exception. I'm running the code using spark-shell.

I've tried some configuration params like following while starting the spark-shell. But none of them worked. Is there anything I'm missing here? :

--conf "spark.driver.extraLibraryPath=/usr/hdp/2.6.3.0-235/hadoop/lib/native/"
--jars /usr/hdp/current/hadoop-client/lib/snappy-java-1.0.4.1.jar 

Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support. at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65) at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)

Vihari Piratla
  • 8,308
  • 4
  • 20
  • 26
pkgajulapalli
  • 1,066
  • 3
  • 20
  • 44
  • Snappy is a native library, that means it is written in C or other language that cam be compiled into assembler and installed as a shared `.so` file in your system. The java library is just a wrapper over the real snappy to ease the process of calling it. As the error message says, your hadoop distribution was built without support for snappy. You may try asking on a hontorworks blog. – Luis Miguel Mejía Suárez Apr 08 '19 at 14:54

1 Answers1

2

Try to update Hadoop jar file from 2.6.3. to 2.8.0 or 3.0.0. There was the bug in the earlier version of Hadoop: the native snappy library was not available. After modifying of Hadoop core jar, you should be able to perform snappy compression/decompression.