Assuming that each record in the file can be identified as being in a particular sub-population, and that you know ahead of time what size of random sample you want from that sub-population you could hold, for each sub-population, a datastructure allowing you to do Reservoir Sampling, for that sub-population (https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_R).
So repeatedly:
Read a record
Find out which sub-population it is in and get the datastructure representing the reservoir sampling for that sub-population, creating it if necessary.
Use that data-structure and the record read to do reservoir sampling for that sub-population.
At the end you will have, for each sub-population seen, a reservoir sampling data-structure containing a random sample from that population.
For the case when you wish to end up with k of N samples forming a stratified sample over the different classes of records, I don't think you can do much better than keeping k of each class and then downsampling from this. Suppose you can and I give you a initial block of records organised so that the stratified sample will have less than k/2 of some class kept. Now I follow that block with a huge number of records, all of this class, which is now clearly underrepresented. In this case, the final random sample should have much more than k/2 from this class, and (if it is really random) there should be a very small but non-zero probability that more than k/2 of those randomly chosen records came from the first block. But the fact that we never keep more than k/2 of these records from the first block means that the probability with this sampling scheme is exactly zero, so keeping less than k of each class won't work in the worst case.
Here is a cheat method. Suppose that instead of reading the records sequentially we can read the records in any order we chose. If you look through stackoverflow you will see (rather contrived) methods based on cryptography for generating a random permutation of N items without holding N items in memory at any one time, so you could do this. Now keep a pool of k records so that at any time the proportions of the items in the pool are a stratified sample, only adding or removing items from the pool when you are forced to do this to keep the proportions correct. I think you can do this because you need to add an item of class X to keep the proportions correct exactly when you have just observed another item of class X. Because you went through the records in a random order I claim that you have a random stratified sample. Clearly you have a stratified sample, so the only departure from randomness can be in the items selected for a particular class. But consider the permutations which select items not of that class in the same order as the permutation actually chosen, but which select items of that class in different orders. If there is bias in the way that items of that class are selected (as there probably is) because the bias will affect different items of that class in different ways depending on what permutation is selected the result of the random choice between all of these different permutations is that the total effect is unbiassed.