3

I'm trying to loop through a log text file, containing SSH logins and other logs.

The program is returning the total number of SSH logins.

My solution does work but seems a bit slow (~3.5 sec on a 200mo file). I would like to know if there are any ways to make it faster. I'm not really familiar with good practices on Java.

I'm using the BufferedReader class. Maybe there are better classes/methods but everything else I found online was slower.

{
            BufferedReader br;
            if(fileLocation != null) {
                br = new BufferedReader(new FileReader(fileLocation));
            }
            else {
                br = new BufferedReader((new InputStreamReader(System.in, "UTF-8")));
            }
            String line;
            Stack<String> users = new Stack<>();
            int succeeded = 0;
            int failed;
            int total = 0;

            if(!br.ready()) {
                help("Cannot read the file", true);
            }
            while((line=br.readLine())!=null)
            {
                if(!line.contains("sshd")) continue;
                String[] arr = line.split("\\s+");
                if(arr.length < 11) continue;


                String log = arr[4];
                String log2 = arr[5];
                String log3 = arr[8];
                String user = arr[10];
                if(!log.contains("sshd")) continue;
                if(!log2.contains("Accepted")) {
                    if(log3.contains("failure")) {
                        total++;
                    }
                    continue;
                }
                total++;
                succeeded++;

                if(!repeat) {
                    if (users.contains(user)) continue;
                    users.add(user);
                }

                System.out.println((total + 1) + " " + user);
            }

Full code : https://pastebin.com/xp2P9wja

Also, here's some lines of the log file :

Dec  3 12:20:12 k332 sshd[25206]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.147.222.137 
Dec  3 12:20:14 k332 sshd[25204]: error: PAM: Authentication failure for illegal user admin from 10.147.222.137
Dec  3 12:20:14 k332 sshd[25204]: Failed keyboard-interactive/pam for invalid user admin from 10.147.222.137 port 36417 ssh2
Dec  3 12:20:14 k332 sshd[25204]: Connection closed by invalid user admin 10.147.222.137 port 36417 [preauth]
Dec  3 12:20:40 k332 sshd[25209]: pam_tally2(sshd:auth): Tally overflowed for user root

Final output is :

Total :
103 unique IP SSH logins succeeded
30387 SSH logins succeeded
17186 SSH logins failed
47573 total SSH logins

Thanks for your time!

EDIT: Mo (Mega Octet) = MB (Mega Byte) (we usually say Mo in french)

Here's the full updated code is anyone needs it : https://pastebin.com/Kn5EqLNX

redlegamin
  • 43
  • 6
  • 1
    Could you clarify what you mean by "a 200mo file"? – Jon Skeet May 16 '22 at 11:37
  • Here are two hints to improve performance, AFAIK : `if` statements take time, you should **group** them using conditionnal operators (like `&&`). Printing in the console is also time consuming, you should remove the `System.out.println` that is in the `while` loop and wait for the end of the process, or at least only print every 100 or 1000 entries using a modulo `%` – gui3 May 16 '22 at 11:53
  • Another way of optimising (and the *best* way) - `PasswordAuthentication no` in sshd_config ;) – g00se May 16 '22 at 11:55
  • I'm not an expert, but @g00se comment's sounds like a hacker phishing or a really bad joke – gui3 May 16 '22 at 11:58
  • @gui3 *I'm not an expert* Well I'm inclined to believe you, but surely you could have a stab at guessing what `PasswordAuthentication no` *means* and whether it would tend toward hardening or softening security? – g00se May 16 '22 at 12:06
  • Jon Skeet : Sorry, i mean MB, not Mo. gui3 : Ok thx i will try. For `System.out.println`, i already tried to remove it completly to test and improve the speed, but it doesn't do much. At least it improves time by ~0.1 sec, so that's still a win. g00se : I cant, this is a exercice, we only have to create a Java program, not modify the ssh config. Thx tho, didn't know about this. – redlegamin May 16 '22 at 12:08
  • sorry to pollute this thread, but I need to mention to @g00se that **irony is hard to understand in written texts, furthermore when it's not your native language** (that's my case and the case of many users of SO) and if `PasswordAuthentication no` indeed means softening your security (with published IPs furthermore) I believe your comment could fool a lot of non-expert programmers. No offense though - but we're on SO to help or get helped, not to have fun IMO – gui3 May 16 '22 at 12:16
  • @JonSkeet : 'Mo' means "megaoctet', an 'octet' being '8 bits'. – dangling else May 16 '22 at 12:34
  • 1
    `Stack` is inefficient. Use `HashSet` instead. (Even though that's not where most of the CPU spends its time.) – k314159 May 16 '22 at 22:07

1 Answers1

6

If you get a profile of your code, it becomes clear that the problem is in the String.split() method:

enter image description here

This is a known problem in the standard Java library: Java split String performances.

So in order to speed up your code, you need to speed up this part of the code in some way. The first thing I can suggest is to replace the code on lines 75-79 with this:

Pattern pattern = Pattern.compile("\\s+");
while ((line = br.readLine()) != null) {
    if (!line.contains("sshd")) continue;
    String[] arr = pattern.split(line);
    if (arr.length < 11) continue;
...
}

This may speed up the code a bit, but you can see from the profile that a lot of time is still spent in Pattern and Matcher methods. We need to get rid of Pattern and Matcher for a significant speedup.

For single-character patterns split works without using Regex and does it quite efficiently, let's try replacing the code with:

while ((line = br.readLine()) != null) {
    if (!line.contains("sshd")) continue;
    String[] arr = Arrays.stream(line.split(" "))
                    .filter(s -> !s.isEmpty())
                    .toArray(String[]::new);
    if (arr.length < 11) continue;
...
}

This code runs almost twice as fast on the same data.

ikorennoy
  • 220
  • 1
  • 9
  • Nicee! It does lower the executation time per 2 thx a lot! i will now try to use a profile to better understand how to optimize my things – redlegamin May 16 '22 at 12:58