I don't think you'll be able to do it all in 1G. Note that your 200 Mvalue dataset would take ~763 MiB, leaving only ~261 MiB available for auxiliary data. This rules out any approach that requires you to store the indices at the same time as the values, since an index into 200 Mvalues would take at least 28 bits. Practically, you'd really want 32 bits, which would take the same space as the original (presumably 32-bit) floating point values.
One approach to consider would be to perform a sort on the original data while logging the decision information to a bitmap, then replacing the original data with rank indices and reversing the permutation using the log.
However, the resulting permutation would require at least log2(N!) > N log2(N) - N log2(e)
bits of storage in the worst case (so there's no way to get around it by using a radix sort or something). For the specified problem, note that log2(200M)>27
so the log could require more than (200M * 25.5) / (8bits/byte) ~ 608 MiB
-- almost as large as the original dataset, and far larger than the specified auxiliary space.
You could write the decision log to disk, and reread it to generate your answer. But if you're allowing disk I/O, you might as well do an external sort instead, which would allow you to solve problems much larger than your memory could hold.