Spark error with google/guava library: java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.refreshAfterWrite

Question

I have a simple spark project - in which in the pom.xml the dependencies are only the basic scala, scalatest/junit, and spark:

    <dependency>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-compiler</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest_${scala.binary.version}</artifactId>
        <version>3.0.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
        <scope>compile</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
        <scope>compile</scope>
    </dependency>
</dependencies>

When attempting to run a basic spark program the SparkSession init fails on this line:

 SparkSession.builder.master(master).appName("sparkApp").getOrCreate

Here is the output / error:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/07 18:06:15 INFO SparkContext: Running Spark version 2.2.1
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder
.refreshAfterWrite(JLjava/util/concurrent/TimeUnit;)
Lcom/google/common/cache/CacheBuilder;
    at org.apache.hadoop.security.Groups.<init>(Groups.java:96)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:73)

at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2424)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)

I have run spark locally many dozens of times on other projects, what might be wrong with this simple one? Is there a dependency on $HADOOP_HOME environment variable or similar?

Update By downgrading the spark version to 2.0.1 I was able to compile. That does not fix the problem (we need newer) version. But it helps point out the source of the problem

Another update In a different project the hack to downgrade to 2.0.1 does help - i.e. execution proceeds further : but then when writing out to parquet a similar exception does happen.

8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 (TID 2618)
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
    at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
    at org.apache.hadoop.io.compress.CodecPool.<clinit>(CodecPool.java:74)
    at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.<init>(CodecFactory.java:92)
    at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169)
    at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
    at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetFileFormat.scala:562)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
    at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

score 2 · Answer 1 · answered May 15 '18 at 06:11

2

This error occurs due to version mismatch between Google's guava library and Spark. Spark shades guava but many libraries use guava. You can try Shading the Guava dependencies as per this post. Apache-Spark-User-List

answered May 15 '18 at 06:11

Yayati Sule

1,601
13
25

I mentioned in the post and showed in the `pom.xml` that no additional /outside-of-spark dependencies were added. This question is specifically about what *inside* of spark/spark-dependencies is causing this to happen. – WestCoastProjects May 15 '18 at 07:47
Guava comes bundled with many dependencies hence the solution to Shade it in your pom.xml file – Yayati Sule May 15 '18 at 08:02
Your comment did not actually answer my comment ;) – WestCoastProjects May 15 '18 at 08:24

S S · Answer 2 · 2020-01-15T12:25:03.473

Adding shade plugin to your pom file and relocating google package can resolve this issue.

More information can found here and here

  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.2.1</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
            <relocations>
                <relocation>
                    <pattern>com.google.common</pattern>
                    <shadedPattern>shade.com.google.common</shadedPattern>
                </relocation>
          </relocations>
        </configuration>
      </execution>
    </executions>
  </plugin>

If this also doesn't help then adding guava library of version 15.0 works nicely. The reason of this work around is in dependencyManagement. The nice SO answer is here

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>15.0</version>
        </dependency>
    </dependencies>
</dependencyManagement>

score 0 · Answer 3 · answered Jun 27 '19 at 08:07

I am getting this error in spring boot : java.lang.TypeNotPresentException: Type com.google.common.cache.CacheBuilderSpec com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache

The issue is due to "com.google.guava:guava" api. In springboot this api comes under some other api might be "spring-boot-starter-web" or "springfox-swagger2" api so we need to first exclude guava api from springfox-swagger2 jar and need to add updated version of guava api.spring-data-mongodb

Solution: 1. add guava dependency on the top of all the dependency so that springboot can ge the latest version:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>19.0</version>
</dependency>

Find out the spring boot dependecy where artifactId: "guava" included then exlude "guava" artifact from that dependency and then add the guava dependency like above.

I'm not using `spring boot`: can you explain why you added this answer? — WestCoastProjects, Jun 27 '19 at 13:38

Spark error with google/guava library: java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.refreshAfterWrite

3 Answers3

Linked