1

I'm trying to find the size of my case class object inside scala project using sizeEstimator but it is giving unexpected results.

import org.apache.spark.util.SizeEstimator
case class event(imei: String, date: String)
val check = event(imei, date)
      println("size is event obj " + SizeEstimator.estimate(check))
      println("size is single charct " + SizeEstimator.estimate("a"))
      println("size is imei " + SizeEstimator.estimate(imei))

It gives output as

size is event obj 520
size is single 48
size is imei 72

Why is this taking insane size ? for a single character "a" it should be 1 byte and my imei is 15 character string value to it also should be 15 byte. Any suggestions please. Thanks,

Pinnacle
  • 165
  • 2
  • 14

1 Answers1

0
scala> val char:java.lang.Character = 'a'
char: Character = a

scala> SizeEstimator.estimate(char)
res18: Long = 16

scala> SizeEstimator.estimate("A")
res19: Long = 48

If you want to have actual Java Heap Size, you will have to declare them specifically with Java Types else it won't work by just putting single quotes.

scala> SizeEstimator.estimate('A')
<console>:27: error: type mismatch;
 found   : Char('A')
 required: AnyRef
Note: an implicit exists from scala.Char => java.lang.Character, but
methods inherited from Object are rendered ambiguous.  This is to avoid
a blanket implicit which would convert any scala.Char to any AnyRef.
You may wish to use a type ascription: `x: java.lang.Character`.
              SizeEstimator.estimate('A')

And in general below is the formula of string size calculation-

Minimum String memory usage (bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)

  • Thank you Manohar for your help, but just having one doubt why SizeEstimator.estimate("A") gives 48 ? although we have only one character in this string – Pinnacle May 15 '18 at 08:46
  • It uses above mentioned formula and rounds up to multiples of 8 though I am not sure at which point it rounds up. – manohar amrutkar May 16 '18 at 10:00