Reading and shuffling the array would involve a lot of unnecessary data movement.
Here are a few ideas:
One: When you say you need a random subset, what exactly do you mean by "random" in this context? By which I mean, are the records in any particular order, or is the order relevant to whatever it is you are trying to randomize?
Because my first thought is that if the records are not in any relevant order, than you can get a random selection by simply calculating total size divided by sample size, and then selecting every n-th record. So for example, if you have 53 million records and you want a sample of 2 million, take 53 millions / 2 million ~= 26, so read every 26th record.
Two: If that's not adequate, a more rigorous solution would be to generate 2 million random numbers in the range of zero to 53 million, insuring no duplicates.
Two-A: If you're sample size was small compared to the total number of records, like if you were just picking out a few hundred or a few thousand, I'd generate an array of however many entries, and for each entry, compare it to all previous entries to check for duplicates. If it's a duplicate, loop around and try again until you find a unique value.
Two-B: Assuming your numbers are not just examples but the actual values, then your sample size is large compared to the total population. In that case, given the ample memory on modern computers, you should be able to do this efficiently by creating an array of 53 million booleans initialized to false, each, of course, representing one record. Then run through a loop 2 million times. For each iteration, generate a random number from 0 to 53 million. Check the corresponding boolean in the array: If it's false, set it to true. If it's true, generate another random number and try again.
Three: Or wait, here's a better idea yet, given the relatively large percentage: Calculate the percentage of records you want to include. Then loop through a counter of all the records. For each, generate a random number from 0 to 1 and compare it to the desired percentage. If it's less, read that record and include it in the sample. If it's greater, skip the record.
If it's important to get the exact number of sample records, you can recalculate the percentage for each record. For example -- and to keep the example simple, let's pretend you want 10 out of 100 records:
You'd start with 10 / 100 = .1 So we generate a random number, say it comes up .04. .04<.1, so we include record #1.
Now we recalculate the percentage. We want 9 more records out of 99 remaining gives 9/99~=.0909 Say our random number is .87. That's greater, so we skip record #2.
Recalculate again. We still need 9 records out of 98 remaining. So the magic number is 9/98, whatever that comes to. Etc.
Once we've got as many records as we want, the probability for future records will be zero, so we'll never go over. If we near the end and haven't picked up enough records, the probability will get very close to 100%. Like, if we still need 8 records and there are only 8 records left, the probability will be 8/8=100% so we'll be guaranteed to take the next record.