1

So i have a big csv file and my code prints all the rows but i want to print, for example, only 20 random rows from 100000 rows. I know that somehow with random.sample u can do that, but i don't really know how. Any suggestions?

There is my code:

import csv

with open(r'Z:/**/**/**/test_examples_doors/
**') as csvfile:   
 data = csv.DictReader(csvfile)
 for row in data:
     if row['open']=='1':
print(row['image'], row['open'])
BreadX
  • 19
  • 8

2 Answers2

3

I assume you want to randomly sample your data, rather than just take the first 20 rows?

In this case you can convert data to a list and then sample it:

import csv
import random
with open(r'Z:/datasets/room-segmentation/labeling/test_examples_doors/labels.csv') as csvfile:
    data = csv.DictReader(csvfile)
sampled_data = random.sample(list(data), 20)
Mentastin
  • 80
  • 5
  • 1
    if you know in advance the number of rows, note that it would be more efficient to select first the rows to keep. You'd waste much less memory and processor than parsing and storing the full 20k rows just to throw away most of them at the end ;) – mozway Nov 29 '21 at 12:48
0

If you don't need to code this yourself, GoCSV has the sample command which does just this:

gocsv sample -n 20 labels.csv
Zach Young
  • 10,137
  • 4
  • 32
  • 53