8

I have a data set with 36k rows. I want to randomly select 9k rows from it using pandas. How do I accomplish this task?

2 Answers2

14

I think you can use sample - 9k or 25% rows:

df.sample(n=9000)

Or:

df.sample(frac=0.25)

Another solution with creating random sample of index by numpy.random.choice and then select by loc - index has to be unique:

df = df.loc[np.random.choice(df.index, size=9000)]

Solution if not unique index:

df = df.iloc[np.random.choice(np.arange(len(df)), size=9000)]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
7

numpy

i = np.random.permutation(np.arange(len(df)))
idx = i[:9000]
pd.DataFrame(df.values[idx], df.index[idx])
piRSquared
  • 285,575
  • 57
  • 475
  • 624