0

I am new to Python.

I wanted to find profiles from a log file, with following criteria

  • user logged in, user changed password, user logged off within same second
  • those actions (log in, change password, log off) happened one after another with no other entires in between.

with .txt file looks like this

Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|iukj| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|klij| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|yytr| - |user logged in| -

asdf - is typical profile name from the log file

Here is what I have done so far

import collections
import time

with open('logfiles.txt') as infile:
    counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
    print(line, count)
    
time.sleep(10)

I know the logic is to get same hours, minutes, and seconds if they are duplicates, then I print the profiles. But I am confuse how to get time from a file.

Any help is very much appreciated.

EDIT:

The output would be:
asdf
klij
plnb
zzad
anub13
  • 69
  • 1
  • 9
  • 1
    Split each line at the `|`. Look at the output of `print("Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged in| -".split('|'))`. – Matthias Jul 06 '22 at 14:28
  • Thank you for the suggestion @Matthias, I edit the question with the goal of the output. – anub13 Jul 06 '22 at 14:36
  • `...how to get time ...` - which part are you confused about? – wwii Jul 06 '22 at 15:07

2 Answers2

1

To parse a time I would use regex for this task to match a time expression on each line.

Something like this would work.

EDIT: I omitted the lines which don't correspond to the formatting.

import re

time = re.search(r'(\d+):(\d+):(\d+)', line).group()

As far as the profile name is concerned, I would use a split function on the most common lines like @Matthias suggested and your code would look something like this:

import collections
import time

with open('logfiles.txt') as infile:
    counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
    """The line splits where the '|' symbol is and creates a list.
       We choose the third element of the list - profile"""
    list_of_segments = line.split('|')
    if len(list_of_segments) == 6:
       print(list_of_segments[2])
    
time.sleep(10)
Daniel
  • 391
  • 1
  • 12
  • I tried the code and it gives me an error print(line.split('|')[2]) IndexError: list index out of range – anub13 Jul 06 '22 at 15:06
  • 1
    I tried to reproduce your log file from your example and ran the script. Given the example, the snippet of code works fine. It might be that some lines are empty or have a different kind of formatting. – Daniel Jul 06 '22 at 15:16
  • ok, let me tried again. – anub13 Jul 06 '22 at 15:19
  • @anub13 I've edited the code to omit lines that don't correspond to the formatting. Could you please try again? – Daniel Jul 06 '22 at 15:31
1

I think this is more complicated than you might have imagined. Your sample data is very straightforward but the description (requirements) imply that the log might have interspersed lines that you need to account for. So I think it's a case of working through the log file sequentially recording certain actions (log on, log off) and keeping a note of what was observed on any previous line. This seems to work with your data:

from datetime import datetime as DT, timedelta as TD

FMT = '%a, %d %b %Y %H:%M:%S %z'
td = TD(seconds=1)
prev = None

with open('logfile.txt') as logfile:
    for line in logfile:
        if len(tokens := line.split('|')) > 4:
            dt, _, profile, _, action, *_ = tokens
            if prev is None or prev[1] != profile:
                prev = (dt, profile) if action == 'user logged in' else None
            else:
                if action == 'user logged off':
                    if DT.strptime(dt, FMT) - DT.strptime(prev[0], FMT) <= td:
                        print(profile)
                    prev = None

Output:

asdf
plnb
qweq
zzad
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
  • Thank you Sir @Stuart, but when I tried the code it gives me error `dt, _, profile, _, action, *_ = line.split('|') ValueError: not enough values to unpack (expected at least 5, got 1)` – anub13 Jul 06 '22 at 15:18
  • 1
    @anub13 That must be because your file isn't as you've shown it in the question. Does it have blank lines perhaps? – DarkKnight Jul 06 '22 at 15:20
  • Looks like the log file isn't consistent on every line. A solution would be to omit such lines which don't match the format. – Daniel Jul 06 '22 at 15:21
  • `https://pastebin.com/7utgYFEg` here is the actual file, edited the profiles name only – anub13 Jul 06 '22 at 15:22
  • 1
    @anub13 Answer edited – DarkKnight Jul 06 '22 at 15:24
  • Thank you sir @Stuart , but right now, it just give blank page, I just learned python 2 days ago, sorry I need more hint regarding this. – anub13 Jul 06 '22 at 15:32
  • 1
    @anub13 I downloaded the file from pastebin and the output is:- fxsciaqulmlk erdsfsdfsdf abcbbabab fxsciaqulmla – DarkKnight Jul 06 '22 at 15:36
  • Sir @Stuart, yes, I forgot I edited the original file, it works now! thank you so much! – anub13 Jul 06 '22 at 15:39