I am working on some data extraction in a text file. I have been using MATLAB for a while now, but it seems kinda more stressful. I started using python for the extraction. Now I have a pretty complex question and I don't even have any idea about how to do it.
Here's what I've done so far. I have a log file that looks likes this:
2017-12-21T23:59:19.120Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:19.120Z 'D|Beat:2143|B->'
2017-12-21T23:59:19.120Z 'D|Beat:2113|sndB:0x5caa'
2017-12-21T23:59:19.175Z 'I|PSnd: 61|snd[3D]:FFFF m:0x5caa e:0'
2017-12-21T23:59:19.175Z 'I|PSnd: 233|sD[3D]m:0x5caa e:0'
2017-12-21T23:59:19.175Z 'D|Beat:1259|WDTimeout: 300'
2017-12-21T23:59:19.175Z 'D|Beat:1282|sd:0x5caa: e:0'
2017-12-21T23:59:19.175Z 'D|Beat:1302|sprts'
2017-12-21T23:59:19.175Z 'D|LgPl: 68|BSP:getSize:19'
2017-12-21T23:59:19.175Z 'D|Beat:5503|GetPckt:0x4e5e'
2017-12-21T23:59:19.175Z 'D|Beat:7140|Prtns->'
2017-12-21T23:59:19.175Z 'D|Beat:2008|sevt:72'
2017-12-21T23:59:19.175Z 'I|Beat:2021|SndQ:1'
2017-12-21T23:59:19.175Z 'D|Beat:1805|snd:0x4e5e'
2017-12-21T23:59:19.175Z 'I|PSnd: 61|snd[B0]:FFFF m:0x4e5e e:0'
2017-12-21T23:59:19.175Z 'I|PSnd: 233|sD[B0]m:0x4e5e e:0'
2017-12-21T23:59:19.175Z 'D|Beat:1866|sd:0x4e5e:0'
2017-12-21T23:59:19.175Z 'D|Beat:1192|drop:2402 q:43'
2017-12-21T23:59:19.301Z 'D|Beat:1220|Rcv<-RP, s:2402'
2017-12-21T23:59:19.301Z 'D|LgPl: 68|BSP:getSize:19'
2017-12-21T23:59:19.301Z 'I|Beat:1243|RcvQ:1'
2017-12-21T23:59:19.301Z 'D|Beat:1245|FrMsg:0x4cc0 QMsg:0x3ba4'
2017-12-21T23:59:19.301Z 'D|Beat:8934|AAltB->B1302'
2017-12-21T23:59:19.416Z 'D|Beat:1192|drop:2402 q:50'
2017-12-21T23:59:19.416Z 'D|Beat:10392|RStp'
2017-12-21T23:59:19.437Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:19.489Z 'D|Beat:6502|slt:2'
2017-12-21T23:59:19.489Z 'D|Beat:10341|RStrt'
2017-12-21T23:59:19.489Z 'D|Beat:4713|prtTS:2'
2017-12-21T23:59:19.489Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:19.552Z 'D|Beat:1192|drop:2402 q:36'
2017-12-21T23:59:19.820Z 'D|Beat:1192|drop:2402 q:48'
2017-12-21T23:59:19.820Z 'D|Beat:10747|PLife:67'
2017-12-21T23:59:19.820Z 'D|Beat:4906|nojump'
2017-12-21T23:59:19.820Z 'D|Beat:10392|RStp'
2017-12-21T23:59:19.820Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:19.873Z 'D|Beat:6502|slt:3'
2017-12-21T23:59:20.266Z 'D|Beat:6502|slt:4'
2017-12-21T23:59:20.266Z 'D|Beat:10341|RStrt'
2017-12-21T23:59:20.266Z 'D|Beat:4713|prtTS:4'
2017-12-21T23:59:20.266Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:20.318Z 'D|Beat:1192|drop:2301 q:49'
2017-12-21T23:59:20.339Z 'D|Beat:1358|drop:2301 q:49'
2017-12-21T23:59:20.339Z 'D|Beat:1220|Rcv<-RP, s:2402'
2017-12-21T23:59:20.339Z 'D|LgPl: 68|BSP:getSize:19'
2017-12-21T23:59:20.339Z 'I|Beat:1243|RcvQ:1'
2017-12-21T23:59:20.339Z 'D|Beat:1245|FrMsg:0x4192 QMsg:0x4cc0'
2017-12-21T23:59:20.339Z 'D|Beat:1192|drop:2402 q:48'
2017-12-21T23:59:20.454Z 'D|Beat:1192|drop:2402 q:51'
2017-12-21T23:59:20.579Z 'D|Beat:1192|drop:2402 q:48'
2017-12-21T23:59:20.610Z 'D|Beat:10747|PLife:68'
2017-12-21T23:59:20.610Z 'D|Beat:4906|nojump'
2017-12-21T23:59:20.610Z 'D|Beat:10392|RStp'
2017-12-21T23:59:20.610Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:20.632Z 'D|Beat:6502|slt:5'
2017-12-21T23:59:21.045Z 'D|Beat:6502|slt:6'
2017-12-21T23:59:21.045Z 'D|Beat:10341|RStrt'
2017-12-21T23:59:21.045Z 'D|Beat:4713|prtTS:6'
2017-12-21T23:59:21.045Z 'D|Beat: 971|RStrtD'
Now I need to extract out any line that contains RStrtD
followed by another line having RStpD
and then find the difference in the time between them, for every case of this in the text file, and then add the times together.
I extracted using the code below:
print" trying out something spectacular"
def get_line(file_name, find_word1, find_word2):
lines = []
for line in file_name.strip().split('\n'):
if find_word1 in line:
lines.append(line)
elif find_word2 in line:
lines.append(line)
else:
pass
return lines
def get_all_lines(f_name, find_word1, find_word2):
f_content = open(f_name, 'r').read()
return get_line(f_content,find_word1, find_word2)
def get_files_in (in_file, find_word1,find_word2, out_file):
filtererd_lines = get_all_lines(in_file, find_word1, find_word2)
joinliens = '\n'.join(filtererd_lines)
open(out_file, 'w').write(joinliens)
#fix= "mm", "cts"
get_files_in("./sss1.txt", "RStrtD", "RStpD", "./result1.txt")
After running this, I received the following output:
2017-12-21T23:59:43.561Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:44.419Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:44.715Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:46.730Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:47.062Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:48.273Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:48.625Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:49.487Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:49.783Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:51.789Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:52.122Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:53.334Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:53.680Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:54.529Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:54.835Z 'D|Beat: 997|RStpD'
2017-12-21T23:59:56.840Z 'D|Beat: 971|RStrtD'
2017-12-21T23:59:57.182Z 'D|Beat: 997|RStpD'
This is good, but I now need to subtract the times from each other on each line and then take the sum of all of these differences. I really don't know how iI can go about this. I'm not yet familier with time vectors in python.