I am processing a number of text files line by line using BufferReader.readlLine()
.
Two files having same size 130MB but one take 40sec to get processed while other takes 75 sec.
I noticed one file has 1.8 million of lines while other has 2.1 million. But when I tried to process a file with 3.0 million lines having same size it took 30 mins to process.
So my question is:
Is this behavior because of seek time of buffer reader (I want to know how
BufferedReader
works or parses the file line by line?)Is there any way I can read the file line by line in a faster way?
Ok friends, I am providing some more details.
I am splitting the line into three parts using regex, then using SimpleUnsortedWriter
(provided by Cassandra) I am writing it to some file as key, column and value. After the 16MB data is processed it flushes to disk.
But the processing logic is same for all the files, even one file of size 330MB but less no of lines around 1 million gets processed in 30 sec. What could be the reason?
deviceWriter = new SSTableSimpleUnsortedWriter(
directory,
keyspace,
"Devices",
UTF8Type.instance,
null,
16);
Pattern pattern = Pattern.compile("[\\[,\\]]");
while ((line = br.readLine()) != null)
{
//split the line i n row column and value
long timestamp = System.currentTimeMillis() * 1000;
deviceWriter .newRow(bytes(rowKey));
deviceWriter .addColumn(bytes(colmName), bytes(value), timestamp);
}
Have changed -Xmx256M to -Xmx 1024M
but it is not helping anyways.
Update: According to my observation, as I am writing into buffer (in physical memory), as the no. of writes into a buffer are increasing the newer writes are taking time. (This is my guess)
Please reply.