My aim is to read from a large file, process 2 lines at a time, and write the result to a new file(s). These files can get very large, from 1GB to 150GB in size, so I'd like to attempt to do this processing using the least RAM possible
The processing is very simple: The lines split by a tab delimited, certain elements are selected, and the new String is written to the new files.
So far I have attempted using BufferedReader
to read the File and PrintWriter
to output the lines to a file:
while((line1 = br.readLine()) != null){
if(!line1.startsWith("@")){
line2 = br.readLine();
recordCount++;
one.println(String.format("%s\n%s\n+\n%s",line1.split("\t")[0] + ".1", line1.split("\t")[9], line1.split("\t")[10]));
two.println(String.format("%s\n%s\n+\n%s",line2.split("\t")[0] + ".2", line2.split("\t")[9], line2.split("\t")[10]));
}
}
I have also attempted to uses Java8 Streams to read and write from the file:
stream.forEach(line -> {
if(!line.startsWith("@")) {
try {
if (counter.getAndIncrement() % 2 == 0)
Files.write(path1, String.format("%s\n%s\n+\n%s", line.split("\t")[0] + ".1", line.split("\t")[9], line.split("\t")[10]).getBytes(), StandardOpenOption.APPEND);
else
Files.write(path2, String.format("%s\n%s\n+\n%s", line.split("\t")[0] + ".2", line.split("\t")[9], line.split("\t")[10]).getBytes(), StandardOpenOption.APPEND);
}catch(IOException ioe){
}
}
});
Finally, I have tried to use an InputStream
and scanner
to read the file and PrintWriter
to output the lines:
inputStream = new FileInputStream(inputFile);
sc = new Scanner(inputStream, "UTF-8");
String line1, line2;
PrintWriter one = new PrintWriter(new FileOutputStream(dotOne));
PrintWriter two = new PrintWriter(new FileOutputStream(dotTwo));
while(sc.hasNextLine()){
line1 = sc.nextLine();
if(!line1.startsWith("@")) {
line2 = sc.nextLine();
one.println(String.format("%s\n%s\n+\n%s",line1.split("\t")[0] + ".1", line1.split("\t")[9], line1.split("\t")[10]));
two.println(String.format("%s\n%s\n+\n%s",line2.split("\t")[0] + ".2", line2.split("\t")[9], line2.split("\t")[10]));
}
}
The issue that I'm facing is that the program seems to be storing either the data to write, or the input file data into RAM.
All of the above methods do work, but use more RAM than I'd like them to.
Thanks in advance,
Sam