100

I have a big file. It includes approximately 3.000-20.000 lines. How can I get the total count of lines in the file using Java?

Telemachus
  • 19,459
  • 7
  • 57
  • 79
firstthumb
  • 4,627
  • 6
  • 35
  • 45
  • 1
    Judging from your comments to answers, the word you are looking for is 'efficient', not 'effective'. – AakashM Aug 14 '09 at 13:51
  • @Firstthumb: Please don't delete comments *after* people have responded to them. It makes the thread confusing for people who arrive late to the show. – Telemachus Aug 14 '09 at 13:57
  • Why? 20,000 lines is not big. Millions is big. Why do you think you need to know the number of lines at all? If you do, you can count them as you process them. You have to read the entire file just to count the lines. You may as well do something useful at the same time. – user207421 Dec 17 '15 at 23:56

15 Answers15

143
BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();

Update: To answer the performance-question raised here, I made a measurement. First thing: 20.000 lines are too few, to get the program running for a noticeable time. I created a text-file with 5 million lines. This solution (started with java without parameters like -server or -XX-options) needed around 11 seconds on my box. The same with wc -l (UNIX command-line-tool to count lines), 11 seconds. The solution reading every single character and looking for '\n' needed 104 seconds, 9-10 times as much.

Mnementh
  • 50,487
  • 48
  • 148
  • 202
  • What effeciency do you mean? Performance? In that case you will have no better way, because lines can have different lengths you will have to read the complete file, to count the line-numbers (wc does it too). If you speak about programming efficiency than I'm sure you can put it in a utility-method (or some common library did it already). – Mnementh Aug 14 '09 at 13:55
  • @Firstthumb. Not efficient maybe, but who cares. He's only counting 20k lines which is pretty small. This code gets my vote for being the simplest. – Chris Dail Aug 14 '09 at 13:55
  • how about the efficiency of LineNumberReader since it extends BufferedReader? – Narayan Aug 15 '09 at 07:56
  • Nobody says this is better than the LineNumberReader, at least I don't do it. – Mnementh Aug 15 '09 at 08:35
  • 1
    next question? why don't you do it :D – Narayan Aug 15 '09 at 08:41
  • I was somewhat sure, that the BufferedReader will work at least as fast as a FileReader and inspecting every single character. I proved that through measuring the time (and actually showed that inspecting every char is far slower). But I think the LineNumberReader-solution will work as good as the one with the BufferedReader. That's why I upvoted that answer. – Mnementh Aug 15 '09 at 10:05
  • 2
    Inspecting every byte should be definitely faster (when using a buffer) because FileReader must decode the bytes to text. – fhucho Jun 05 '13 at 10:09
  • For modern Java, the [Answer by Augustin](https://stackoverflow.com/a/35523560/642706) should be the accepted Answer. Uses `Files.lines`. – Basil Bourque Jan 05 '19 at 01:29
  • @Mnementh: nitpicking: naming the variable `lines` is misleading given that it holds the int count, and not the actual lines. `count`, `linesRead` or `numLines` would be more obvious. – ccpizza Mar 17 '20 at 06:58
98

Files.lines

Java 8+ has a nice and short way using NIO using Files.lines. Note that you have to close the stream using try-with-resources:

long lineCount;
try (Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)) {
  lineCount = stream.count();
}

If you don't specify the character encoding, the default one used is UTF-8. You may specify an alternate encoding to match your particular data file as shown in the example above.

Augustin
  • 2,444
  • 23
  • 24
  • 2
    bed solution . we can have a problem with charset – Mikhail May 25 '16 at 10:12
  • 2
    charset is UTF-8 by default – Alex Yursha Oct 24 '16 at 23:05
  • 1
    @Mikhail Pass the character encoding of your particular data file as a `Charset` object in the optional second argument. See: [`Files.lines(Path path, Charset cs)`](https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#lines-java.nio.file.Path-java.nio.charset.Charset-). The default is UTF-8; for other encodings pass the `CharSet`. – Basil Bourque Jan 05 '19 at 01:31
  • 7
    Files.lines(path).count(); should not be used directly. Instead use try with resources. Example:: long lineCount; try (Stream linesStream =Files.lines(path) ){ lineCount =linesStream.count(); } – aprodan Apr 16 '19 at 15:27
  • 1
    Take care though, `path` isn't closed. :-/ – Eric Duminil Mar 12 '20 at 12:40
  • I don't know if this code is very efficient. Big files will use all the memory heap available and cause "java heap space" errors on `stream.count()` – Dherik Aug 10 '20 at 12:59
  • 1
    This works as long as you know the CharSet of the file. If you don't know this then it's going to fail for unknown encoding. – Sagar Nov 04 '21 at 17:07
  • If the charset does not match with the file, it throws an exception. Handle with care. – Lluis Martinez Feb 08 '23 at 20:34
33

use LineNumberReader

something like

public static int countLines(File aFile) throws IOException {
    LineNumberReader reader = null;
    try {
        reader = new LineNumberReader(new FileReader(aFile));
        while ((reader.readLine()) != null);
        return reader.getLineNumber();
    } catch (Exception ex) {
        return -1;
    } finally { 
        if(reader != null) 
            reader.close();
    }
}
om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
Narayan
  • 6,031
  • 3
  • 41
  • 45
16

I found some solution for this, it might useful for you

Below is the code snippet for, count the no.of lines from the file.

  File file = new File("/mnt/sdcard/abc.txt");
  LineNumberReader lineNumberReader = new LineNumberReader(new FileReader(file));
  lineNumberReader.skip(Long.MAX_VALUE);
  int lines = lineNumberReader.getLineNumber();
  lineNumberReader.close();
MarCrazyness
  • 2,172
  • 1
  • 27
  • 28
brig
  • 3,721
  • 12
  • 43
  • 61
5

Read the file through and count the number of newline characters. An easy way to read a file in Java, one line at a time, is the java.util.Scanner class.

Esko Luontola
  • 73,184
  • 17
  • 117
  • 128
5

This is about as efficient as it can get, buffered binary read, no string conversion,

FileInputStream stream = new FileInputStream("/tmp/test.txt");
byte[] buffer = new byte[8192];
int count = 0;
int n;
while ((n = stream.read(buffer)) > 0) {
    for (int i = 0; i < n; i++) {
        if (buffer[i] == '\n') count++;
    }
}
stream.close();
System.out.println("Number of lines: " + count);
ZZ Coder
  • 74,484
  • 29
  • 137
  • 169
5

Do You need exact number of lines or only its approximation? I happen to process large files in parallel and often I don't need to know exact count of lines - I then revert to sampling. Split the file into ten 1MB chunks and count lines in each chunk, then multiply it by 10 and You'll receive pretty good approximation of line count.

matt
  • 4,614
  • 1
  • 29
  • 32
4

All previous answers suggest to read though the whole file and count the amount of newlines you find while doing this. You commented some as "not effective" but thats the only way you can do that. A "line" is nothing else as a simple character inside the file. And to count that character you must have a look at every single character within the file.

I'm sorry, but you have no choice. :-)

Malax
  • 9,436
  • 9
  • 48
  • 64
3

This solution is about 3.6× faster than the top rated answer when tested on a file with 13.8 million lines. It simply reads the bytes into a buffer and counts the \n characters. You could play with the buffer size, but on my machine, anything above 8KB didn't make the code faster.

private int countLines(File file) throws IOException {
    int lines = 0;

    FileInputStream fis = new FileInputStream(file);
    byte[] buffer = new byte[BUFFER_SIZE]; // BUFFER_SIZE = 8 * 1024
    int read;

    while ((read = fis.read(buffer)) != -1) {
        for (int i = 0; i < read; i++) {
            if (buffer[i] == '\n') lines++;
        }
    }

    fis.close();

    return lines;
}
fhucho
  • 34,062
  • 40
  • 136
  • 186
  • I wonder if using a pre-compiled RegEx Pattern would make it faster or slower. What it would do is work with all line endings, I believe. And, I think it might make it faster, too. – ingyhere Nov 18 '13 at 01:46
  • Some of the above solutions can take advantage of buffering, also, if the benefits would help. For instance, "new LineNumberReader(new FileReader(theFilePathStr), 8096)" or something. – ingyhere Nov 18 '13 at 01:48
  • Be careful about character encodings... – Rag Sep 02 '16 at 20:40
2

If the already posted answers aren't fast enough you'll probably have to look for a solution specific to your particular problem.

For example if these text files are logs that are only appended to and you regularly need to know the number of lines in them you could create an index. This index would contain the number of lines in the file, when the file was last modified and how large the file was then. This would allow you to recalculate the number of lines in the file by skipping over all the lines you had already seen and just reading the new lines.

blackNBUK
  • 88
  • 4
2

Old post, but I have a solution that could be usefull for next people. Why not just use file length to know what is the progression? Of course, lines has to be almost the same size, but it works very well for big files:

public static void main(String[] args) throws IOException {
    File file = new File("yourfilehere");
    double fileSize = file.length();
    System.out.println("=======> File size = " + fileSize);
    InputStream inputStream = new FileInputStream(file);
    InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "iso-8859-1");
    BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
    int totalRead = 0;
    try {
        while (bufferedReader.ready()) {
            String line = bufferedReader.readLine();
            // LINE PROCESSING HERE
            totalRead += line.length() + 1; // we add +1 byte for the newline char.
            System.out.println("Progress ===> " + ((totalRead / fileSize) * 100) + " %");
        }
    } finally {
        bufferedReader.close();
    }
}

It allows to see the progression without doing any full read on the file. I know it depends on lot of elements, but I hope it will be usefull :).

[Edition] Here is a version with estimated time. I put some SYSO to show progress and estimation. I see that you have a good time estimation errors after you have treated enough line (I try with 10M lines, and after 1% of the treatment, the time estimation was exact at 95%). I know, some values has to be set in variable. This code is quickly written but has be usefull for me. Hope it will be for you too :).

long startProcessLine = System.currentTimeMillis();
    int totalRead = 0;
    long progressTime = 0;
    double percent = 0;
    int i = 0;
    int j = 0;
    int fullEstimation = 0;
    try {
        while (bufferedReader.ready()) {
            String line = bufferedReader.readLine();
            totalRead += line.length() + 1;
            progressTime = System.currentTimeMillis() - startProcessLine;
            percent = (double) totalRead / fileSize * 100;
            if ((percent > 1) && i % 10000 == 0) {
                int estimation = (int) ((progressTime / percent) * (100 - percent));
                fullEstimation += progressTime + estimation;
                j++;
                System.out.print("Progress ===> " + percent + " %");
                System.out.print(" - current progress : " + (progressTime) + " milliseconds");
                System.out.print(" - Will be finished in ===> " + estimation + " milliseconds");
                System.out.println(" - estimated full time => " + (progressTime + estimation));
            }
            i++;
        }
    } finally {
        bufferedReader.close();
    }
    System.out.println("Ended in " + (progressTime) + " seconds");
    System.out.println("Estimative average ===> " + (fullEstimation / j));
    System.out.println("Difference: " + ((((double) 100 / (double) progressTime)) * (progressTime - (fullEstimation / j))) + "%");

Feel free to improve this code if you think it's a good solution.

lpratlong
  • 1,421
  • 9
  • 17
1

Quick and dirty, but it does the job:

import java.io.*;

public class Counter {

    public final static void main(String[] args) throws IOException {
        if (args.length > 0) {
            File file = new File(args[0]);
            System.out.println(countLines(file));
        }
    }

    public final static int countLines(File file) throws IOException {
        ProcessBuilder builder = new ProcessBuilder("wc", "-l", file.getAbsolutePath());
        Process process = builder.start();
        InputStream in = process.getInputStream();
        LineNumberReader reader = new LineNumberReader(new InputStreamReader(in));
        String line = reader.readLine();
        if (line != null) {
            return Integer.parseInt(line.trim().split(" ")[0]);
        } else {
            return -1;
        }
    }

}
Wilfred Springer
  • 10,869
  • 4
  • 55
  • 69
0

Read the file line by line and increment a counter for each line until you have read the entire file.

Ken Liu
  • 22,503
  • 19
  • 75
  • 98
-1

Try the unix "wc" command. I don't mean use it, I mean download the source and see how they do it. It's probably in c, but you can easily port the behavior to java. The problem with making your own is to account for the ending cr/lf problem.

Daniel
  • 374
  • 2
  • 5
-2

The buffered reader is overkill

Reader r = new FileReader("f.txt");

int count = 0;
int nextchar = 0;
while (nextchar != -1){
        nextchar = r.read();
        if (nextchar == Character.getNumericValue('\n') ){
            count++;
        }
    }

My search for a simple example has createde one thats actually quite poor. calling read() repeadedly for a single character is less than optimal. see here for examples and measurements.

NSherwin
  • 184
  • 1
  • 10
  • 2
    The BufferedReader handles different line-endings well. Your solution ignore Mac-line-endings ('\r'). That may be OK. Anyways, your solution doesn't actual read from the file in the moment. I think you forgot a line. – Mnementh Aug 14 '09 at 13:58
  • 5
    What's going to change nextchar here? If you're going to call read() on every iteration, I strongly suspect that a BufferedReader approach will be *much* faster... – Jon Skeet Aug 14 '09 at 13:59
  • that was the idea ;-/ I wanted to write the simplest possible example. I wonder what the speed difference would be? – NSherwin Aug 14 '09 at 14:01
  • 3
    BufferedReader is not overkill here. The code in this answer will be hideously slow - FileReader.read() will pull one character at a time from the file. – skaffman Aug 14 '09 at 14:06
  • 1
    And the answer is 'Dramatic' examples given here http://java.sun.com/developer/technicalArticles/Programming/PerfTuning/ – NSherwin Aug 14 '09 at 14:06
  • I measured it on my box, Jon Skeet is right, the difference is big. I added the measurements in my answer. – Mnementh Aug 14 '09 at 14:25
  • @nsherwin Dead link – Lluis Martinez May 13 '22 at 17:03