0

Using the sample() function I can get the random rows. Data set having 1000000 rows of data and I want to have a subset of 20000 rows. Importing random lines can be done through this solution

https://stackoverflow.com/a/22259008/8966221

reading a dataset

dataset = read_csv(file_path)

dataset_sub = dataset.sample(20000, random_state=1)

However I want to select random rows between row number 250000 to 750000. Any possible solution in that regard?.

Community
  • 1
  • 1
Devarshi Mandal
  • 703
  • 8
  • 16

3 Answers3

1

What you can do is to create a DataFrame containing the rows between row number 250000 to 750000, then select 20000 random rows from that.

dataset_sub = dataset.loc[250000:750000].sample(20000, random_state=1)
Andreas
  • 2,455
  • 10
  • 21
  • 24
0

I think you need this:

dataset = read_csv(file_path)
dataset_sub = dataset.sample(random.randint(250000,750000), random_state=1)
0

I think the following code works:

import random
a=random.sample(range(250000,750000), 20000)
data=dataset.loc[a]
Enayat
  • 3,904
  • 1
  • 33
  • 47