Update 2022-08-31
While the chosen answer did confirm that the behavior was similar, there actually is a blind spot which is that using the cache_result
implementation will not overwrite the previous table which means that extra storage will be used up when re-caching a new result. In the example below, we can see that a new temp table is being created. Hopefully snowflake can update this. I'm using 0.8.0 at the moment.
>>> dff = session.create_dataframe([1,2,3])
>>> dff.cache_result().explain()
---------DATAFRAME EXECUTION PLAN----------
Query List:
1.
SELECT * FROM (SNOWPARK_TEMP_TABLE_X3FCJ1U38A)
...
--------------------------------------------
>>> dff.cache_result().explain()
---------DATAFRAME EXECUTION PLAN----------
Query List:
1.
SELECT * FROM (SNOWPARK_TEMP_TABLE_Z9H68STVDH)
....
Original question
In snowpark ml, does the cache_result()
method for the snowpark DataFrame class essentially do the same thing as writing to a temporary table? It sure seems that way, and it would save a lot of keystrokes, but I just wanted to be certain of it as well as understand any potential difference in resource credit usage.
In other words, are there any differences to these two approaches below?
sql = '''
select * from foo
'''
# Approach 1: saving to temporary table and then assigning the table
session.sql('...').write.mode('overwrite').save_as_table('my_tbl', table_type='temporary')
df = session.table('my_tbl')
# Approach 2: using Cache result
df = session.sql(sql).cache_result()