We need to perform random sampling in apache spark using java. Hence we need to pick random number of exact number of records from a dataset.
We are using below code. Sometimes it doesn’t pick exact number of records.
sampledDataSet=sampledDataSet.union(specficClassName.orderBy(rand()).limit(500));
Illustration:
Suppose in a case DataSet specficClassName has 700 records it picks 650 even though we have mentioned 500 as a limit in the above example.
We are not getting exact 500 records most of the time.
Can you please help us which function to be used in order to get exact number of records.