This question is similar to this one here, but applied to pandas
df = pd.DataFrame({'tid': [0]*44 + [2]*66, 'fidx': list(range(44)) + list(range(66))})
I need to sample 10 'fidx' per 'tid' such that each fidx is futhers apart.
I figure out how to do it like this; however, I think this can be done with df.groupby
and some other functions but I can't seem to figure it out.
def sampling(df):
mins = df['tid'].drop_duplicates().index
maxes = df['tid'].drop_duplicates(keep='last').index
frames = []
for mi, ma in zip(mins, maxes):
frames.append([mi + int(x*(ma-mi)/10) for x in range(10)])
frames = list(chain(*frames))
return frames
The worst part is having to flatten the list at the end.
Expected output
df.iloc[frames, :]
tid fidx
0 0 0
4 0 4
8 0 8
12 0 12
17 0 17
21 0 21
25 0 25
30 0 30
34 0 34
38 0 38
44 2 1
50 2 14
57 2 21
63 2 27
70 2 34
76 2 40
83 2 47
89 2 53
96 2 60
102 2 66
10 fidx for each tid and the fidx are as evenly separated as possible