1

I want to sample 1% rows of the table. To do this, I am doing -

val df = spark.sql("select * from <table>").sample("0.01")
df.collect()

This scans my entire table which has lot of data ~ 100GB. Is there any way by which I can sample 1% records without reading the whole table and by only reading partial data (~ 1-2 GB).

zero323
  • 322,348
  • 103
  • 959
  • 935
moriarty007
  • 2,054
  • 16
  • 20

0 Answers0