I want to sample 1% rows of the table. To do this, I am doing -
val df = spark.sql("select * from <table>").sample("0.01")
df.collect()
This scans my entire table which has lot of data ~ 100GB. Is there any way by which I can sample 1% records without reading the whole table and by only reading partial data (~ 1-2 GB).