2

Spark core supports both raw storage and serialized RDD caching.
This good article explains this.
If you use persist - you may specify any of levels of caching, but if you're use cache you may only use MEMORY_ONLY by default without serialization according this question.
Does anyone know how to use MEMORY_ONLY_SER caching in SparkSQL?
cacheTable(my_table) will cache with MEMORY_ONLY option, not with MEMORY_ONLY_SER, even if I have spark.storage.StorageLevel=MEMORY_ONLY_SER in spark environment by default.
My target is using serialized table caching from SparkSQL. Any ideas?

Community
  • 1
  • 1
seaman29
  • 79
  • 1
  • 8

1 Answers1

2

Try the following:

Java:

Option<String> tableName = new Some<String>("MyTable");
hiveContext.cacheManager().cacheQuery(hiveContext.table(tableName.get()),  tableName, StorageLevel.MEMORY_ONLY_SER());

Scala:

...cacheQuery(sqlContext.table(tableName), Some(tableName), MEMORY_ONLY_SER)

Spark 1.3.0+ :
See 1.3.1 branch code
See 1.5.1 branch code (validate withhiveContext.isCached(tableName) )

Leet-Falcon
  • 2,107
  • 2
  • 15
  • 23