Modified Fisher-Yates algorithm
The shuffle solution can be improved, since you only have to shuffle the first k elements of the array. But that's still O(n) because the naïve shuffle implementation requires an array of size n, which needs to be initialized to the n values from 0 to n-1.
Initialize value[n] to {0..n-1}
For i from 0 to k-1:
swap(value[i], value[random_in_range(i, n)])
Result is value[0..k-1]
To improve on that, we can use a kind of virtual array, consisting of two parts:
value: An array of the first k elements, which will be the resulting selection. This is initialized to {0..k-1}
rest: A sparse datastructure (a hashtable, for example) with capacity of k entries, containing all the remaining entries of the array which are not simply their own index. Initially empty.
Now we can define the function which swaps elements i and j from the value array (Note: i<k, as is guaranteed by the above algorithm):
# To swap elements i and j
If j < k:
# Both elements to be swapped are in the selection
tmp = value[i]; value[i] = value[j]; value[j] = tmp
Else If j in rest:
# Element j has been swapped before
tmp = value[i]; value[i] = rest[j]; rest[j] = tmp
Else:
# The value at j is still j, we now add it to the virtual array
rest[j] = value[i]; value[i] = j
That uses O(k) space and time, for any value of k≤n.
Three algorithm strategy
A simpler solution using O(k) memory is to just keep a hashtable of all the selected values, and generate values until the hashtable contains k values, rejecting duplicates.
For small k, the probability of a randomly-selected element being a duplicate is insignificant and the naïve hashtable is certainly the simplest solution. On the other hand, if k is a significant fraction of n, the hash table is mostly wasted space, since a simple bit vector of size n would be be sufficient to record which values have been seen. And for very large k, the rejection algorithm will take too much time as the sample fills up, and the full vector needed for the shuffle is not much more space than the vector that would be used to hold the sample.
As a result, the pragmatic solution is probably to use whichever of the three solutions uses less space and time: For values of k sufficiently large that the n-bit bitvector is smaller than a hash table with k entries but not so large that the probability of rejection is significant (say, n/64≤k≤n/4), use the bitvector. For smaller values of k, use the simple hashtable algorithm, and for values of k close to n , use a Fisher-Yates shuffle on the full n-element vector (but limited to k steps).
Since we choose the O(n) strategies only in cases where k>cn for some constant c, the composite algorithm is still O(k) time and space because with that constraint, n is in O(k).