I have a big CSV file whose size is not specific and maybe more than 4 GB. I need to read some rows from the file randomly as test cases to do some tests in an application.
It's impossible to read the full file in memory because it will raise an OutOfMemoryError
exception.
One solution is to generate an array of some numbers falling in the range of the total number, then sort the list. At last read from the file line by line according to the number stored in the array. So I could get a random set of full rows from the csv file.
Is there a library
or method
to read a full row from a big csv
file randomly
?
One solution:
// generate random numbers
List<Integer> indexList = new ArrayList<>();
for (int i = 0; i < testCount; i++) {
int random = faker.numberBetween(0, total);
indexList.add(random);
}
// sort
Collections.sort(indexList);
// read from a file
List<String> list = new ArrayList<>();
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("test.csv"), "UTF-8"));
String line;
int lineNum = 0;
int pos = 0;
int currentNum = indexList.get(pos);
while ((line = reader.readLine()) != null) {
while (currentNum == lineNum) {
list.add(line);
pos++;
if (pos == testCount)
break;
currentNum = indexList.get(pos);
}
if (pos == testCount)
break;
lineNum++;
}
reader.close();