spark is lazy right?
so what does load()
do?
start = timeit.default_timer()
df = sqlContext.read.option(
"es.resource", indexes
).format("org.elasticsearch.spark.sql")
end = timeit.default_timer()
print('without load: ', end - start) # almost instant
start = timeit.default_timer()
df = df.load()
end = timeit.default_timer()
print('load: ', end - start) # takes 1sec
start = timeit.default_timer()
df.show()
end = timeit.default_timer()
print('show: ', end - start) # takes 4 sec
If show()
is the only action, I would guess load
won't take much time as 1sec. So I'm concluding load()
is an action (as opposed to transformation in spark)
Does load actually load whole data into memory? I don't think so, but then what does it do?
I've searched and looked at the doc https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html but it doesn't help..