1

I have one doubt regarding which collection should I use. Have discussed a lot but wanted more inputs.

I have a source system from where 100,000s of trade files come to my application in say every 30mins. Each file having many lines of code (say 1000). My app should store and process only last 10 lines of trade details.

If I read file contents using buffer reader line by line then I have to keep on adding each line details in some collection and finally once I reach the last line somehow remove all and keep only last 10 lines. So by keeping all 1000 lines in collection even if I do not require all is a performance issue. Is there any collection or any approach to improve this.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Suvasis
  • 1,451
  • 4
  • 24
  • 42
  • 1
    Lakh is not a globally recognized unit. To reach a broader audience you should probably use 100 thousand or .1 million. –  Aug 23 '13 at 09:42
  • 1
    Why do you think you need to store all the lines in a collection? You could also just store the last 10 lines read, and every time you read a line, discard the oldest one. – Jesper Aug 23 '13 at 09:43
  • I have seen max of 96k and min of 23k in one break means 30mins. This number varies. This may get increase too. We can assume 96k – Suvasis Aug 23 '13 at 09:43
  • 2
    1. Open file. 2. Seek to end of file. 3. Collect lines moving backward until you have 10 of them. 4. Process. 5. Rinse, repeat. No need for storing up masses of data you don't care about. – T.J. Crowder Aug 23 '13 at 09:45
  • @Crowder : How do we start reading a file from the end of file in java? – Suvasis Aug 23 '13 at 09:50
  • @T.J.Crowder Character encoding could be a problem here. – Thomas Aug 23 '13 at 10:05

5 Answers5

2

You can use a CircularFifoBuffer:

CircularFifoBuffer is a first in first out buffer with a fixed size that replaces its oldest element if full.

Usage for keeping in memory only the last 10 lines:

CircularFifoBuffer buffer = new CircularFifoBuffer(10);
// read lines and add them to the buffer

At the end of reading the lines, the buffer only contains the last 10 lines.

Jean Logeart
  • 52,687
  • 11
  • 83
  • 118
  • I will go through CircularFifoBuffer in details. Does it mean, it will keep overriding earlier elements as we keep on adding new elements and finally we will be left with 10 elements? – Suvasis Aug 23 '13 at 09:48
  • Yes. At the end the buffer only contains the last 10 lines. – Jean Logeart Aug 23 '13 at 09:49
  • Can we some how avoid reading all the lines in file. I mean directly accessing last ten lines. – Suvasis Aug 23 '13 at 09:52
  • As other people suggest, you can. Starting at the end and going backwards. But it will require more coding on your side. If there is only 1000 lines in a file, it does not really matter reading them all: it will be very efficient and you won't notice any improvement. – Jean Logeart Aug 23 '13 at 09:55
1

Use a RandomAccessFile, and try ever larger buffers to read. I made a tail function with a line-length-hint, to make a guess. Be aware that whether the file ends with a newline or may make a difference in the result. Also the code can be improved upon (power of two block size and so on).

        File textFile = new File("...");
        String[] lines = tail(textFile, "UTF-8", 10, 160);
        System.out.println("#Lines: " + lines.length);
        for (String line : lines) {
            System.out.println(line);
        }


String[] tail(File textFile, String charSet, int lines, int lineLengthHint)
        throws IOException {
    if (lineLengthHint < 80) {
        lineLengthHint = 80;
    }
    RandomAccessFile in = new RandomAccessFile(textFile, "r");
    try {
        long fileSize = in.length();
        int bytesCount = lines * lineLengthHint;
        // Loop allocating a byte array hopefully sufficiently large.
        for (;;) {
            if (fileSize < bytesCount) {
                bytesCount = (int)fileSize;
            }
            byte[] bytes = new byte[bytesCount];
            in.seek(fileSize - bytesCount);
            in.readFully(bytes);

            int startIndex = bytes.length; // Position of last '\n'.
            int lineEndsFromStart = 0;
            boolean bytesCountSufficient = true;
            while (lineEndsFromStart - 1 < lines) {
                int pos = startIndex - 1;
                while (pos >= 0 && bytes[pos] != '\n') {
                    --pos;
                }
                startIndex = pos; // -1 will do fine.
                ++lineEndsFromStart;
                if (pos < 0) {
                    bytesCountSufficient = false;
                    break;
                }
            }
            if (bytesCountSufficient || fileSize == bytesCount) {
                String text = new String(bytes, startIndex + 1,
                    bytes.length - (startIndex + 1), charSet);
                return text.split("\r?\n");
            }
            // Not bytesCountSufficient:
            //lineLengthHint += 10; // Average line length was larger.
            bytesCount += lineLengthHint * 4; // Try with more.
        }
    } finally {
        in.close();
    }
}
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
0

You could easily fashion a discarding queue which keeps only the last 10 lines. A LinkedList would be a good start for such an implementation. See this previous question on the topic.

This won't solve the problem of reading in the whole file, but getting around that means quite a bit more coding. You'd need a RandomAccessFile and search for the 10nth newline from the end. The appropriateness of this solution depends on how big the files are.

Community
  • 1
  • 1
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
0

You could use a String array of size 10 and only always store the last 10 lines:

BufferedReader in = ...
String[] buffer = new String[10];
int bufferStartIndex = 0;
for (String line; (line = in.readLine()) != null;) {
    buffer[bufferStartIndex++ % buffer.length] = line;
}

At the end of the for-loop, bufferStartIndex will point to the first of the 10 last lines of the file. However if the file contains less than 10 lines, then you should reset bufferStartIndex to 0.

Thomas
  • 17,016
  • 4
  • 46
  • 70
-1
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.LinkedList;
import java.util.Queue;

public class Test {
    private static Queue<String> bottom=new LinkedList<String>();
    private static int count=0;

    public static void main(String[] args) throws IOException{
        func(3);
    }

    //function to get count, bottom n lines
    private static void func(int n) throws IOException{
        FileInputStream fstream = new FileInputStream("abc.txt");
        BufferedReader br = new BufferedReader(new InputStreamReader(fstream));

        String strLine;

        //Read File Line By Line
        while ((strLine = br.readLine()) != null){
          count++;
          if(count<=n){
              //initialize bottom as top n 
              bottom.add(strLine);
          }else{
              bottom.remove();
              bottom.add(strLine);
               }
        }
        System.out.println(count);
        System.out.println(bottom.toString());
        br.close();    
    }
}

I have used Queue to get the bottom n lines. For further details you can visit: http://blog.everestkc.com.np

babueverest
  • 443
  • 1
  • 6
  • 17