Get grouped Dictionary list from a file that has a time and errors then plot the time differences in python

Question

I have this file as below:

Date;Time;Task;Error_Line;Error_Message
03-13-15;08:2123:10;C:LOGINMAN;01073;Web Login Successful from IP Address xxx.xxx.x.xx
03-13-15;05:23:1235;B:VDOM;0906123;Port 123 Device 1012300 Remote 1 1012301 Link Up RP2009
03-13-15;05:23:123123;A:VCOM;0906123;Port 123 Device 1012300 Remote 1 1012301 Link Up RP2009
03-13-15;05:23:123123;B:VDOM;1312325;Port 123 Device 1012300 Remote 1 1012301 Receive Time Error: 2123666 23270 1396 69
03-13-15;05:23:1233;B:VDOM;13372;Port 123 Device 1012300 Remote 1 1012301 Send Time Error: 123123123 1888 1123123123 69
03-13-15;05:23:1233;A:VCOM;1312325;Port 123 Device 1012300 Remote 1 1012301 Receive Time Error: 2123666 23270 1396 69
03-13-15;05:23:1233;A:VCOM;13372;Port 123 Device 1012300 Remote 1 1012301 Send Time Error: 123123123 1888 1123123123 69
03-13-15;05:21:56;B:VDOM;07270;Port 123 Device 1012300 Remote 1 1012301 AT Timer Expired
03-13-15;05:21:56;A:VCOM;07270;Port 123 Device 1012300 Remote 1 1012301 AT Timer Expired

The desired output should be like that:

D = {'Error_line1': [Time1,Time2,...],'Error_Line2' = [Time1,Time2,..],...}

I was looking for plotting the differences between or time based on Error_Line. Error_Line in my file occurs different time . I want group times according to Error_Line. I have have no idea if that works for plotting time.

score 1 · Answer 1 · answered Oct 04 '16 at 19:10

As far as grouping by line number, this should do the trick:

import csv
D = {}
with open('logfile') as f:
    reader = csv.DictReader(f, delimiter=';')
    for row in reader:
        el = row['Error_Line']
        if el not in D:
            D[el] = []  # Initiate an empty list
        D[el].append(row['Time'])

score 1 · Answer 2 · edited May 23 '17 at 12:19

I won't touch the plotting because there are multiple ways of displaying the data and I don't know what style you're looking for. Do you want to have separate graphs for each Error_Line? Each Error_Line's datapoints represented on one graph? Some other way of comparing times and errors (e.g. mean of each Error_Line's times plotted against each other, variance, yadda yadda)?

Getting that info into a dict, however, will involve getting each line, splitting it with the semicolon as the delimiter, and picking the pieces out that you want. Personally I'd do this as such:

from collections import defaultdict
ourdata = defaultdict(list)
with open('stackoverflow.txt') as ourfile:
    for row in ourfile:
        datatoadd = row.split(';')[1:4:2]
        ourdata[datatoadd[1]].append(datatoadd[0])

As far as those timestamps go they're currently strings. You'll also need to convert them (within the append statement would do it all at once) to the data type you need (e.g. numpy's datetimes which allow for arithmetic).

For more information on what's going on here, see: defaultdict, with, str.split(), extended slice notation

Get grouped Dictionary list from a file that has a time and errors then plot the time differences in python

2 Answers2