14

I have a text file with a rather large amount of data of about 2,000,000 lines. Going through the file with the following code snippet is easy but that's not what I need ;-)

def f = new File("input.txt")
f.eachLine() {
    // Some code here
}

I need to read only a specific range of lines from the file. Is there a way to specify the start and end line like this (pseudo-code)? I'd like to avoid loading all lines into memory with readLines() before selecting the range.

// Read all lines from 4 to 48
def f = new File("input.txt")
def start = 4
def end = 48
f.eachLine(start, end) {
    // Some code here
}

If this is not possible with Groovy any Java solution is welcome as well :-)

Cheers, Robert

Robert Strauch
  • 12,055
  • 24
  • 120
  • 192

9 Answers9

9

The Java solution:

BufferedReader r = new BufferedReader(new FileReader(f));
String line;
for ( int ln = 0; (line = r.readLine()) != null && ln <= end; ln++ ) {
    if ( ln >= start ) {
        //Some code here
    }
}

Gross, eh?

Unfortunately unless your lines are fixed length, you're not going to be able to skip to the startth line efficiently since each line could be arbitrarily long and therefore all data needs to be read. That doesn't preclude a nicer solution though.

Java 8

Thought it was worth an update to show how to do this efficiently with Streams:

int start = 5;
int end = 12;
Path file = Paths.get("/tmp/bigfile.txt");

try (Stream<String> lines = Files.lines(file)) {
    lines.skip(start).limit(end-start).forEach(System.out::println);
}

Because Streams are lazily evaluated, it will only read lines up to and including end (plus whatever internal buffering it chooses to do).

Mark Peters
  • 80,126
  • 17
  • 159
  • 190
  • That's about as nice as it gets. Since "line boundary" is a calculated condition, there's no way to jump to that point in the file. – Ari Gesher Nov 03 '10 at 18:33
  • again, not a groovy answer :-( – smartnut007 Nov 03 '10 at 22:24
  • 6
    @smartnut007: did you downvote because of that? First of all, Java is valid Groovy code. Second of all, the OP specifically asked for Java alternatives, and the question is tagged first with Java. Thinking you have a better answer is not a good reason to downvote others'. If yours is better, that will get worked out through upvotes. Downvote if something is wrong. – Mark Peters Nov 03 '10 at 23:01
5

Here's a Groovy solution. Unfortunately, this will read every line of the file after start

def start = 4
def end = 48

new File("input.txt").eachLine(start) {lineNo, line ->

    if (lineNo <= end) {
        // Process the line
    }
}
Dónal
  • 185,044
  • 174
  • 569
  • 824
4

Groovy has the possibility to start from some special line now. Here are two citations from docs on File

Object eachLine(int firstLine, Closure closure) 

Object eachLine(String charset, int firstLine, Closure closure) 
Gangnus
  • 24,044
  • 16
  • 90
  • 149
  • 3
    from the docs I got the impression that firstLine is used to determine a number representing first line (you can start counting from 1 or 0) and not the line where a read is started from. – ajurasz Sep 08 '16 at 12:28
3

I don't believe there is any "magic" way to skip to an arbitrary "line" in a file. Lines are merely defined by newline characters, so without actually reading the file, there is no way to know where those will be. I believe you have two options:

  1. Follow Mark Peter's answer and use a BufferedReader to read the file in one line at a time until you reach your desired line. This will obviously be slow.
  2. Figure out how many bytes (rather than lines) your next read needs to start at and seek directly to that point in the file using something like RandomAccessFile. Whether or not it's possible to efficiently know the right number of bytes depends on your application. For example, if you are reading the file sequentially, one piece at a time, you simply record the position you left off at. If all the lines are of a fixed length L bytes, then getting to line N is just a matter of seeking to position N*L. If this is an operation you repeat often, some pre-processing might help: for example, read the entire file once and record the starting position of each line in an in-memory HashMap. Next time you need to go to line N, simply look up it's position in the HashMap and seek directly to that point.
Yevgeniy Brikman
  • 8,711
  • 6
  • 46
  • 60
  • 1
    @smartnut007 Before downvoting my answer and prompoting your own, you may want to re-read the question. Straurob is asking for a way to skip to a specific line N in a file without having to read lines 1-N before it. I discuss why this is problematic and possible workarounds in my answer. The way you do it in your answer may be a nice use of the Groovy syntax, but it requires reading lines 1-N, so it's the _wrong_ answer. – Yevgeniy Brikman Nov 03 '10 at 22:58
  • @smartnut007 You're right that this may be a quite "generic" answer but actually it helped me a lot, especially as I haven't thought about skipping N bytes. That's why I checked this posting as answer. – Robert Strauch Nov 04 '10 at 07:11
  • @straurob Of course Unicode characters could mess the 'skipping N bytes' thing up – tim_yates Nov 04 '10 at 10:56
  • @tim_yates That's correct. However I'm lucky that this file won't change its endocing :-) – Robert Strauch Nov 05 '10 at 09:39
2

In Groovy you can use Category

class FileHelper {
    static eachLineInRange(File file, IntRange lineRange, Closure closure) {
        file.withReader { r->
            def line
            for(; (line = r.readLine()) != null;) {
                def lineNo = r.lineNumber
                if(lineNo < lineRange.from) continue
                if(lineNo > lineRange.to) break
                closure.call(line, lineNo)
            }
        }
    }
}

def f = '/path/to/file' as File
use(FileHelper) {
    f.eachLineInRange(from..to){line, lineNo ->
        println "$lineNo) $line"
    }
}

or ExpandoMetaClass

File.metaClass.eachLineInRange = { IntRange lineRange, Closure closure ->
    delegate.withReader { r->
        def line
        for(; (line = r.readLine()) != null;) {
            def lineNo = r.lineNumber
            if(lineNo < lineRange.from) continue
            if(lineNo > lineRange.to) break
            closure.call(line, lineNo)
        }
    }
}

def f = '/path/to/file' as File
f.eachLineInRange(from..to){line, lineNo ->
    println "$lineNo) $line"
}

In this solution you read each line from file sequentially but don't keep them all in memory.

Jarek Przygódzki
  • 4,284
  • 2
  • 31
  • 41
2

This should do it. I believe this doesn't read any line after "end".

def readRange = {file ->
    def start = 10
    def end = 20
    def fileToRead = new File(file)
    fileToRead.eachLine{line, lineNo = 0 ->
        lineNo++
        if(lineNo > end) {
            return
        }
        if(lineNo >= start) {
            println line                
        }            
    }
}
Vinay
  • 21
  • 1
  • This worked for a different problem I was solving. I couldn't figure out why, but I didn't even do lineNo++ and magically it ++'d itself. – Sundeep Jul 22 '13 at 17:08
1

You have to iterate over the lines from the beginning to get to your starting position, but you can use LineNumberReader (instead of BufferedReader) because it will keep track of the line numbers for you.

    final int start = 4;
    final int end = 48;

    final LineNumberReader in = new LineNumberReader(new FileReader(filename));
    String line=null;
    while ((line = in.readLine()) != null && in.getLineNumber() <= end) {
        if (in.getLineNumber() >= start) {
            //process line
        }
    }
dogbane
  • 266,786
  • 75
  • 396
  • 414
1

Thanks for all your hints. From what you've written I cobbled my own piece of code which seems to be working. Not elegant but it serves its purpose :-)

def f = new RandomAccessFile("D:/input.txt", "r")
def start = 3
def end = 6
def current = start-1
def BYTE_OFFSET = 11
def resultList = []

if ((end*BYTE_OFFSET) <= f.length()) {
    while ((current*BYTE_OFFSET) < (end*BYTE_OFFSET)) {
        f.seek(current*BYTE_OFFSET)
        resultList << f.readLine()
        current++
    }
}
Robert Strauch
  • 12,055
  • 24
  • 120
  • 192
0

Here's another Java solution using LineIterator and FileUtils from Commons / IO:

public static Collection<String> readFile(final File f,
    final int startOffset,
    final int lines) throws IOException{
    final LineIterator it = FileUtils.lineIterator(f);
    int index = 0;
    final Collection<String> coll = new ArrayList<String>(lines);
    while(index++ < startOffset + lines && it.hasNext()){
        final String line = it.nextLine();
        if(index >= startOffset){
            coll.add(line);
        }
    }
    it.close();
    return coll;
}
Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588