2

I'm parsing a JSON with Spark SQL and it works really well, it finds the schema and I'm doing queries with it.

Now I need to "flat" the JSON and I have read in the forum that the best way is to Explode with Hive (Lateral View), so I trying to do the same with it. But I can't even create the context... Spark gives me an error and I can't find how to fix it.

As I have said, at this point I'm only trying to create de context:

println ("Create Spark Context:")
val sc = new SparkContext( "local", "Simple", "$SPARK_HOME")
println ("Create Hive context:")
val hiveContext = new HiveContext(sc)

And it gives me this error:

Create Spark Context:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/26 15:13:44 INFO Remoting: Starting remoting
15/12/26 15:13:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.80.136:40624]

Create Hive context:
15/12/26 15:13:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/12/26 15:13:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/12/26 15:13:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:59 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:14:01 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/12/26 15:14:01 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
Exception in thread "main" java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
  at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:226)
  at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
  at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392)
  at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:174)
  at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:177)
  at pebd.emb.Bicing$.main(Bicing.scala:73)
  at pebd.emb.Bicing.main(Bicing.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.OutOfMemoryError: PermGen space

Process finished with exit code 1

I know it is a very simple question, but I don't really know the reason of that error. Thank you in advance for everyone.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
EMBorque
  • 145
  • 9

1 Answers1

6

Here's the relevant part of the exception:

Caused by: java.lang.OutOfMemoryError: PermGen space

You need to increase the amount of PermGen memory that you give to the JVM. By default (SPARK-1879), Spark's own launch scripts increase this to 128 MB, so I think you'll have to do something similar in your IntelliJ run configuration. Try adding -XX:MaxPermSize=128m to the "VM options" list.

Josh Rosen
  • 13,511
  • 6
  • 58
  • 70
  • Thank Josh, I was looking in that directions and I found that post [link](http://stackoverflow.com/questions/19750340/solve-permgen-errors-when-building-in-intellij-with-maven). But although im doing exactly that (with 512M, even with 1024M) I get the same error. I have never had problems with SQLContext but this first time with HiveContext... – EMBorque Dec 27 '15 at 12:14
  • 1
    You should also consider switching to java 8 where permanent generation space is removed (see http://stackoverflow.com/questions/18339707/permgen-elimination-in-jdk-8) – thoredge Dec 27 '15 at 12:35
  • I'm now trying with Java 8 and it seems to work. Thank thoredge and @Josh for your help!! – EMBorque Dec 27 '15 at 14:04
  • thanks, java 7 and 512m permgen size worked for my case – halil Aug 05 '16 at 07:52