It is MEMORY_ONLY by now. checkout the source code, in Scala, but simple:
def cache(): this.type = persist()
def persist(): this.type = persist(StorageLevel.MEMORY_ONLY)
def persist(newLevel: StorageLevel): this.type = {
// doing stuff...
}
The storage level you should use depends on the RDD itself. For example, when you have no enough RAM, and with MEMORY_ONLY level, you will lose the data and have to calculate again from the beginning. Or, if it is MEMORY_AND_DISK, you will still have a backup on the disk and can read it from the hard disk.
So, most of the time, recalculating these data is faster than reading from the disk (and you have to write it to the disk when persisting, which is even slower). That's why MEMORY_ONLY is the default value.
And differences of the levels can be found in the official guide.
https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence