I have data which has X rows for every key(in this case it is a user). X is variable (for example, I have 1000 rows/data points for user 1 and 50 data points for user 2 - the data points are arranged by timestamp usually). What is the best way for me to get N random rows from the data for each key(each user)? I believe using samplebykey works if I have a fraction but I need N random rows for each key.
Also, in the case that the key has less than N rows, what will be returned?