1

I am analyzing service desk tickets and I need to extract the first timestamp from the comments column. that is, I need to know at what date and time the service desk analyst interacted with a ticket for the first time. I have used datefinder.find_dates() function and that works reasonably well but I have some ticket comments that are very technical and use lots of numbers and IUP Addresses and this seems to confuse the datefinder.find_dates() function and a lot of times it is just spitting irrelevant data. I have tried searching for a tutorial on the function but there are none that are helpful as it seems that this function is not very popular. I have also found this and this SOF questions but they don't address my issue. because datefinder.find_dates() does not work well when there is a lot of number data in a text the only other option is to be able to extract the timestamp from the last paragraph of every observation as they are always located at the beginning of the last paragraph but I don't seem to be able to do it myself hence I am asking.

here is a snippet of how most of the data is layedout:

2019-04-10 12:43:54 - Andras Eger (Work notes)
Sim life cycle attached

2019-04-09 17:25:38 - Timea Magyar (Additional comments)
Thank you for contacting us.
We confirm that we have received your email and we are processing the 
case.
As soon as we get any update from the resolver team, we will inform you.

2019-04-09 17:25:25 - Timea Magyar (Work notes)
VTIS: INC000033296089

2019-04-09 17:22:10 - Timea Magyar (Work notes)
This New Incident was raised on behalf of Daniel Orejuela from [code]<a 
href='new_call.do?sys_id=0b580c90dbf837404cd858a5dc961989&
sysparm_stack=new_call_list.do?sysparm_query=active=true'>CALL0109649</a>
[/code][code]<br><p><span>Call Notes

So the main question is: How do I extract the date&time for the last paragraph of every observation? in this case, the output should be:

2019-04-09 17:22:10
Simon Gibbs
  • 4,737
  • 6
  • 50
  • 80
Raul Gonzales
  • 866
  • 1
  • 15
  • 28

3 Answers3

1

First split your input by \n\n, use the last result from list, and then use regex.

text = "..."

import re

last_paragraph = text.split("\n\n")[-1]

result = re.findall("[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}",last_paragraph)[0]

print (result)

Result:

2019-04-09 17:22:10
Henry Yik
  • 22,275
  • 4
  • 18
  • 40
1

Try this:

import datefinder
>>> s = '''2019-04-10 12:43:54 - Andras Eger (Work notes)
... Sim life cycle attached
...
... 2019-04-09 17:25:38 - Timea Magyar (Additional comments)
... Thank you for contacting us.
... We confirm that we have received your email and we are processing the
... case.
... As soon as we get any update from the resolver team, we will inform you.
...
... 2019-04-09 17:25:25 - Timea Magyar (Work notes)
... VTIS: INC000033296089
...
... 2019-04-09 17:22:10 - Timea Magyar (Work notes)
... This New Incident was raised on behalf of Daniel Orejuela from [code]<a
... href='new_call.do?sys_id=0b580c90dbf837404cd858a5dc961989&
... sysparm_stack=new_call_list.do?sysparm_query=active=true'>CALL0109649</a>
... [/code][code]<br><p><span>Call Notes'''
>>> matches = list(datefinder.find_dates(s))
>>> matches
[datetime.datetime(2019, 4, 10, 12, 43, 54), datetime.datetime(2019, 4, 9, 17, 25, 38), datetime.datetime(2019, 4, 9, 17, 25, 25)]
>>> matches[-1]
datetime.datetime(2019, 4, 9, 17, 25, 25)
>>> str(matches[-1])
'2019-04-09 17:25:25'

Hope this will help.

  • This is what I initially did and it does return the first date but i have some observations (that i cannot share due to sensible data) that confuse datefinder and it returns nonsense data. but thank you for your help – Raul Gonzales Jun 03 '19 at 10:18
  • @RaulGonzales You are welcome. Let me know how you are getting the data. Because it is hard to judge and separate the paragraphs. – Amazing Things Around You Jun 03 '19 at 10:27
1

This will help. If you want you can use datetime library to parse it as a datetime object.

def date_extract(para):
    para = para.split('\n\n')[-1].split(' - ')[0]
    print(para)
    return para
m0h17
  • 357
  • 3
  • 9