I cannot think of a way to randomly do the entire file without somehow maintaining a list of what has already been written. I think if I had to do a memory efficient shuffle, I would scan the file, building a list of offsets for the new lines. Once I have this list of new line offsets, I would randomly pick one of them, write it to stdout, and then remove it from the list of offsets.
I am not familiar with perl, or python, but can demonstrate with php.
<?php
$offsets = array();
$f = fopen("file.txt", "r");
$offsets[] = ftell($f);
while (! feof($f))
{
if (fgetc($f) == "\n") $offsets[] = ftell($f);
}
shuffle($offsets);
foreach ($offsets as $offset)
{
fseek($f, $offset);
echo fgets($f);
}
fclose($f);
?>
The only other option I can think of, if scanning the file for new lines is absolutely unacceptable, would be (I am not going to code this one out):
- Determine the filesize
- Create a list of offsets and lengths already written to stdout
- Loop until bytes_written == filesize
- Seek to a random offset that is not already in your list of already written values
- Back up from that seek to the previous newline or start of file
- Display that line, and add it to the list of offsets and lengths written
- Go to 3.