How to take random sample of a cvs file in Python?

Question

I am new to Python and want to learn Data Wrangling process using it. I am using jupyter for this.

I have a file named fle with 81,000 rows and 89 columns. I want to randomly select about 100 rows from it. How do I do that? I keep on getting following error.

fle=pd.read_csv("C:\Users\Mine\Documents\ssample.csv", low_memory=  False)
import random
sampl = random.sample(fle, 10)

Error that I am getting is:

IndexError                                Traceback (most recent call last)
<ipython-input-37-fa4ec429f883> in <module>()
      1 import random
      2 #To take a sample of 10000 samples
 ----> 3 sampl = random.sample(fle, 10)
      4 #pd.DataFrame(sampler).head(10)

  C:\Users\E061921\AppData\Local\Continuum\Anaconda\lib\random.pyc in sample(self, population, k)
334             for i in xrange(k):         # invariant:  non-selected at [0,n-i)
335                 j = _int(random() * (n-i))
--> 336                 result[i] = pool[j]
337                 pool[j] = pool[n-i-1]   # move non-selected item into vacancy
338         else:

IndexError: list index out of range

Just use numpy's [random.choice](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choice.html#numpy.random.choice) on ```np.arange(number_of_rows, replace=False)``` & then index your dataframe by iloc like described [here](http://stackoverflow.com/questions/16096627/pandas-select-row-of-data-frame-by-integer-index). — sascha, Jun 23 '16 at 20:29

Jordan Bonitatis · Answer 1 · 2016-06-23T22:04:12.813

1

use random.choice instead of sample. you can use csv.DictReader to handle the csv as a list of dicts

import csv
import random

random_rows = set()
with open("C:\Users\Mine\Documents\ssample.csv", "r") as csvfile:
    reader = csv.DictReader(csvfile)

rows = [r for r in reader]
while len(random_rows) < 100:
    random_rows.add(random.choice(rows))

edited Jun 23 '16 at 22:04

answered Jun 23 '16 at 21:58

Jordan Bonitatis

1,527
14
12

Thank you so much. I really appreciate our help. – Dee Jun 24 '16 at 13:25

How to take random sample of a cvs file in Python?

1 Answers1