Selecting random rows (of data) from dataframe / csv file in Python after designating start and end row number?

Question

Using the sample() function I can get the random rows. Data set having 1000000 rows of data and I want to have a subset of 20000 rows. Importing random lines can be done through this solution

https://stackoverflow.com/a/22259008/8966221

reading a dataset

dataset = read_csv(file_path)

dataset_sub = dataset.sample(20000, random_state=1)

However I want to select random rows between row number 250000 to 750000. Any possible solution in that regard?.

score 1 · Answer 1 · answered Nov 19 '18 at 07:10

1

What you can do is to create a DataFrame containing the rows between row number 250000 to 750000, then select 20000 random rows from that.

dataset_sub = dataset.loc[250000:750000].sample(20000, random_state=1)

answered Nov 19 '18 at 07:10

Andreas

2,455
10
21
24

Thanks this is helpful – Devarshi Mandal Nov 19 '18 at 10:16

score 0 · Answer 2 · answered Nov 19 '18 at 07:08

0

I think you need this:

dataset = read_csv(file_path)
dataset_sub = dataset.sample(random.randint(250000,750000), random_state=1)

answered Nov 19 '18 at 07:08

Rudolf Morkovskyi

869
5
19

Thanks for your reply, but I want to randomly extract only say 20,000 rows. I think that argument is also to be entered – Devarshi Mandal Nov 19 '18 at 10:14

score 0 · Accepted Answer · answered Nov 22 '18 at 15:21

0

I think the following code works:

import random
a=random.sample(range(250000,750000), 20000)
data=dataset.loc[a]

answered Nov 22 '18 at 15:21

Enayat

3,904
1
33
47

Selecting random rows (of data) from dataframe / csv file in Python after designating start and end row number?

reading a dataset

3 Answers3