I've devised a way to do reservoir sampling in java, the code I used is here.
I've put in a huge file to be read now, and it takes about 40 seconds to read the lot before out putting the results to screen, and then reading the lot again. The file is too big to store in memory and just pick a random sample from that.
I was hoping I could write an extra while loop in there to get it to out put my reservoirList
at a set period of time, and not just after it finished scanning the file.
Something like:
long startTime = System.nanoTime();
timeElapsed = 0;
while(sc.hasNext()) //avoid end of file
do{
long currentTime = System.nanoTime();
timeElapsed = (int) TimeUnit.MILLISECONDS.convert(startTime-currentTime,
TimeUnit.NANOSECONDS);
//sampling code goes here
}while(timeElapsed%5000!=0)
return reservoirList;
} return reservoirList;
But this outputs a bunch (not the full length of my ReservoirList) of lines and then a whole stream (a few hundred?) of the same line.
Is there a more elegant way to do this? One that, perhaps, works if possible.