0

I have an app which processes the JBOSS logs. It retrieves the active users of the application. Log has lines where it tell which user requested which operation.

the lines in log are like this:

[23 apr 2015 17:14:58,268] [INFO ] [Module: Search (845903)] -- Request is performing by User 'BILL GATES' using the 'App Session' Session

log files is big. ~1mb. many lines. not each of them refers to the user actions. What I do now is read the file LINE by LINE finding the one which has "User" string in it and then retrieve the name from that line.

It works. But - it's slow and it seems pretty heavy on the performance. I'm processing 5+ log files like this.

I'm looking in how can this be done better\faster? RegEx? any suggestions?

Thank you!

Update: code

     If System.IO.File.Exists(tmpusr063) = True Then
        Dim objReader As New System.IO.StreamReader(tmpusr063)
        Do While objReader.Peek() <> -1
            'Get the timestamp from the line
            TextLine = objReader.ReadLine()
            time = Microsoft.VisualBasic.Left(TextLine, 21)
            time = Microsoft.VisualBasic.Right(time, 20)
            tm = DateTime.ParseExact(time, "dd MMM yyyy HH:mm:ss", CultureInfo.CreateSpecificCulture("nl-NL"))
            span = DateTime.Now.TimeOfDay - tm.TimeOfDay
            ' end of getting the timestamp in a Time type. span - is a difference between NOW and when the time on the line
            If CInt(span.Hours) < user4timeh Then 'if the time of the line is withing the limits i set  in hours
                If CInt(span.Minutes) < user4timem Then ' in minutes then start disassembling the line 
                    If TextLine.Contains("User") And TextLine.Contains("'") Then ' if line contains User = line has a username
                        linsplt = TextLine.Split("'") 'retrieving the user name
                        If users.Contains(linsplt(1)) = False Then 
                            users(usrnmbr) = linsplt(1) 'assemble an array of user names
                            usrnmbr += 1
                        End If
                    End If

                End If
            End If

        Loop
        objReader.Close()
    End If

BUT! i think i found a culprit.... the log has lots of line and i don't need those where "User" doesn't exist. but in my logic - i still take EACH line and retrieve time from it and only then check if it's a relevant line for the user name to be in...... damn:) i will move the if statement up:) Not smart:)

GeekSince1982
  • 732
  • 10
  • 25
  • 4
    Юра, from my experience, reading a text file line by line is exactly the most efficient way of accessing data in text files. I suppose you are using `Substring` and `IndexOf`, right? This is efficient. Using a regex will slow down your app performance. Show us your code, and we'll be able to share our thoughts with you. And a file size of 1MB is not big, believe me, I am reading/writing files of 1GB and more, just reading/writing line by line. – Wiktor Stribiżew Apr 24 '15 at 11:00
  • I've voted to close this question because you have not provided any sample code. It's hard to say why your code is slow if you don't show us what you are doing. I agree with @stribizhev that, on the face of it, it sounds like you are doing everything correctly. So, if we are going to dig any deeper, we'd need to see precisely what you are doing. – Steven Doggart Apr 24 '15 at 12:27
  • 1
    @stribizhev :) Here is the code:) – GeekSince1982 Apr 24 '15 at 12:37

1 Answers1

2

This should be a little better:

If System.IO.File.Exists(tmpusr063) = True Then
    Dim NLCultureInfo as CultureInfo = CultureInfo.CreateSpecificCulture("nl-NL")
    Dim objReader As New System.IO.StreamReader(tmpusr063)
    Do While objReader.Peek() <> -1
        TextLine = objReader.ReadLine()
            If TextLine.IndexOf("User") > -1 And TextLine.IndexOf("'") > -1 Then ' if line contains User = line has a username
            'Get the timestamp from the line
            time = TextLine.Substring(1, 20)
            tm = DateTime.ParseExact(time, "dd MMM yyyy HH:mm:ss", NLCultureInfo)
            span = DateTime.Now.TimeOfDay - tm.TimeOfDay
            ' end of getting the timestamp in a Time type. span - is a difference between NOW and when the time on the line
            If CInt(span.Hours) < user4timeh Then 'if the time of the line is withing the limits i set  in hours
                If CInt(span.Minutes) < user4timem Then ' in minutes then start disassembling the line 
                    linsplt = TextLine.Split("'") 'retrieving the user name
                        If Array.IndexOf(users, linsplt(1)) = -1 Then
                        users(usrnmbr) = linsplt(1) 'assemble an array of user names
                        usrnmbr += 1
                    End If
                End If
            End If
        End If
    Loop
    objReader.Close()
End If

The main thing I did was moving the condition about the user to the top of the loop. this way, you don't need to bother with extracting the time and calculating hours and minutes if you don't have a user in the row.

The second thing I did is creating the cultureInfo object outside the loop. since you always use the same culture info, it's better creating it only once.

the third and least important thing I've done is changed your calls to string.Contains to string.IndexOf and -1. based on this post, it should (in theory, anyway), be a tiny little bit better. it might accumilate to some noticable difference, but I don't really think so.

EDITED by YA: I've corrected one line:

Instead of: If users.IndexOf(linsplt(1)) = -1 Then

this since the previous one doesn't work:

If Array.IndexOf(users, linsplt(1)) = -1 Then

Community
  • 1
  • 1
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • thanks! i'll update my code now and see how it runs!:) Also i've made a small correction to your code - one line. Update: Tested! It run's WAY faster!:) – GeekSince1982 Apr 24 '15 at 13:22
  • Glad to help. the key here was that you wrote that if the row has no user, you don't need it. there is no point in calculating anything untill you make sure you actually need the row. the other changes I've made may be improvements, but they are small and insignificant compared to the moving the logic into the main condition. – Zohar Peled Apr 24 '15 at 13:30