Below is one way apart from SizeEstimator
.I use frequently
To know from code about an RDD if it is cached, and more precisely, how many of its partitions are cached in memory and how many are cached on disk? to get the storage level, also want to know the current actual caching status.to Know memory consumption.
Spark Context has developer api method getRDDStorageInfo()
Occasionally you can use this.
Return information about what RDDs are cached, if they are in mem or
on disk, how much space they take, etc.
For Example :
scala> sc.getRDDStorageInfo
res3: Array[org.apache.spark.storage.RDDInfo] =
Array(RDD "HiveTableScan [name#0], (MetastoreRelation sparkdb,
firsttable, None), None " (3) StorageLevel: StorageLevel(false, true, false, true, 1); CachedPartitions: 1;
TotalPartitions: 1;
MemorySize: 256.0 B; ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B)
Seems like spark ui also used the same from this code
Description
With SPARK-13992, Spark supports persisting data into
off-heap memory, but the usage of off-heap is not exposed currently,
it is not so convenient for user to monitor and profile, so here
propose to expose off-heap memory as well as on-heap memory usage in
various places:
- Spark UI's executor page will display both on-heap and off-heap memory usage.
- REST request returns both on-heap and off-heap memory.
- Also these two memory usage can be obtained programmatically from SparkListener.