Spark core supports both raw storage and serialized RDD caching.
This good article explains this.
If you use persist
- you may specify any of levels of caching, but if you're use cache
you may only use MEMORY_ONLY
by default without serialization according this question.
Does anyone know how to use MEMORY_ONLY_SER
caching in SparkSQL?
cacheTable(my_table)
will cache with MEMORY_ONLY
option, not with MEMORY_ONLY_SER
, even if I have spark.storage.StorageLevel=MEMORY_ONLY_SER in spark environment by default.
My target is using serialized table caching from SparkSQL. Any ideas?
Asked
Active
Viewed 1,566 times
2
-
Did it work for you? – Leet-Falcon Oct 30 '15 at 10:18
-
yep, thank you very much! – seaman29 Nov 01 '15 at 10:32
1 Answers
2
Try the following:
Java:
Option<String> tableName = new Some<String>("MyTable");
hiveContext.cacheManager().cacheQuery(hiveContext.table(tableName.get()), tableName, StorageLevel.MEMORY_ONLY_SER());
Scala:
...cacheQuery(sqlContext.table(tableName), Some(tableName), MEMORY_ONLY_SER)
Spark 1.3.0+ :
See 1.3.1 branch code
See 1.5.1 branch code (validate withhiveContext.isCached(tableName)
)

Leet-Falcon
- 2,107
- 2
- 15
- 23