0

I have to read an 8GB .log file to extract some pieces of information, but in that file, there are many lines that I don't need. Some of them are so long (more than 15,000,000 characters) that it slows the code, and it takes more than a day to read it all (without doing any other operation).

I need something that reads the first word in the line and if it starts with a specific sequence skips it without reading any characters.

I tried with skip, but since it says it skips the matched pattern, it has to read the line to match it. In that way, it still reads an extremely long sequence of characters which makes the program too slow.

This is the code I've done so far:

            File logFile = new File(logFilePath);
            Scanner fileScanner = new Scanner(logFile);

            while (fileScanner.hasNextLine()) {
                String currentLine = fileScanner.next();          
                if (currentLine.equals("messaggio:")) {
                    fileScanner.skip("\n");             // This is where I want to skip the line WITHOUT reading it
                }
                else {
                    // Other code
                }
            }

            fileScanner.close();
Dmitriy Popov
  • 2,150
  • 3
  • 25
  • 34
  • 3
    There's no technology I can think of that lets you allow to "skip lines" the way you seem to intend to. How would you know it's "a line" without checking for line separators? – daniu Jul 03 '23 at 09:28
  • Do the lines have the same length, or does their lengths follow a specific pattern? – Sweeper Jul 03 '23 at 09:28
  • @Sweeper no lines don't have same length – SAMUELE BERNARDI Jul 03 '23 at 09:30
  • 2
    Please share the code for reading lines. Maybe we can help to improve the performance. For example with BufferedReader. – Mar-Z Jul 03 '23 at 09:33
  • Looks like java is not the right programming language for what you try to do – Jens Jul 03 '23 at 09:38
  • Use multiple threads? Refer to [How to read a file using multiple threads in Java when a high throughput(3GB/s) file system is available](https://stackoverflow.com/questions/40412008/how-to-read-a-file-using-multiple-threads-in-java-when-a-high-throughput3gb-s) – Abra Jul 03 '23 at 09:40
  • 1
    Not an answer to your question, but I think you can start looking into logs rotation and whether you are logging more things than needed. Are you trying to search whether a specific event occurred? You can briefly tell us what you want to do with the logs. Also, it baffles me what kind of log would have 1.5 million characters in one log line. – user3437460 Jul 03 '23 at 09:41

1 Answers1

0

OK. Just using Scanner API without any buffering is a bad idea. Try following:

Solution

        BufferedReader in
                = new BufferedReader(new FileReader("data/test.log"));
        in.lines().parallel()
                .filter(l -> l.startsWith("messagio:"))
                .forEach(TestApplication::doSomething);

Mar-Z
  • 2,660
  • 2
  • 4
  • 16