I have a data set with 36k rows. I want to randomly select 9k rows from it using pandas. How do I accomplish this task?
Asked
Active
Viewed 7,594 times
8
-
I am new to stack overflow. I will do it. I have clicked on up arrow. i hope u received the upvote :-) – Niranjan Agnihotri Mar 28 '17 at 07:42
-
Thanks.. you did it right ;-) – piRSquared Mar 28 '17 at 07:43
-
Really a dupe of this:http://stackoverflow.com/questions/15923826/random-row-selection-in-pandas-dataframe see last answer – EdChum Mar 28 '17 at 08:01
2 Answers
14
I think you can use sample
- 9k
or 25%
rows:
df.sample(n=9000)
Or:
df.sample(frac=0.25)
Another solution with creating random sample of index
by numpy.random.choice
and then select by loc
- index
has to be unique:
df = df.loc[np.random.choice(df.index, size=9000)]
Solution if not unique index:
df = df.iloc[np.random.choice(np.arange(len(df)), size=9000)]

jezrael
- 822,522
- 95
- 1,334
- 1,252
7
numpy
i = np.random.permutation(np.arange(len(df)))
idx = i[:9000]
pd.DataFrame(df.values[idx], df.index[idx])

piRSquared
- 285,575
- 57
- 475
- 624