0

I have pretty large data that my computer cannot stand to read all the data. So I have no choice but to select limited rows of the data. (I cannot read all the data and choose later. From the beginning I have to choose and read specific rows at the same time.) Also I don't know total rows in the data, cause it is too huge to open in my laptop. However I wanna select random rows, but I have no idea how to do that. I searched but all the solution is that read first, and choose later. Is there anyone who can help me?

import numpy as np 

temp = []
temp = np.random.randint(low = 0, high=100000, size=5000)

temp will be selected rows that I will read.

Rachel.S
  • 7
  • 3
  • 4
    Check out this post :http://stackoverflow.com/questions/22258491/read-a-small-random-sample-from-a-big-csv-file-into-a-python-data-frame – Alex Nov 17 '15 at 15:36
  • Rows vary in length so you don't really know which row is which unless you read them from the top of the file. You could randomly select an offset in the file then scan forward to the next row (you wouldn't know its row number but you'd know its a row) but the selection wouldn't be purely random as longer rows would be more likely hit in the original selection. – tdelaney Nov 17 '15 at 15:37

0 Answers0