how to take date value from string in python?

Question

I'm fetch value from the URL.

import urllib2
response = urllib2.urlopen('url')    
response.read()

It's give me too long string type output, but I only put here what I have issue.

STRING TYPE OUTPUT:

'<p>Dear Customer,</p>
<p>This notice serves as proof of delivery for the shipment listed below.</p>
<dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd>
<dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd>
<dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
<dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
<dt><label>Left At:</label></dt>
<dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'

QUESTION:

how I can take date (09/14/2015 11:07 A.M.) which is assign for Delivered On?

If the time format has constant length. you might use like re.search('Delivered On:
(.*)$',a).group(1)[:20], where a is the string — Vineesh, Sep 25 '15 at 08:04
@Vineesh, Thank you so much for your comments, your code works fine but it's fail when Delivered On: is empty. Here is error. *AttributeError: 'NoneType' object has no attribute 'group'* — Bhavesh Odedra, Sep 25 '15 at 13:20
Can you add an check for it . Like "data = re.search('Delivered On:
(.*)$',a)" then "if data: data.group(1)[:20]". This should handle Nonetype — Vineesh, Sep 25 '15 at 13:29

score 6 · Answer 1 · edited Sep 25 '15 at 16:24

6

You could start by using something like Beautiful Soup or some other html parser. It might look something like this:

from bs4 import BeautifulSoup
import urllib2
response = urllib2.urlopen('url')    
html = response.read()
soup = BeautifulSoup(html)
datestr = soup.find("label", text="Delivered On:").find_parent("dt").find_next_sibling("dd").string

And if you need to, once you have a hold of the date string, you can use strptime to convert it to a datetime object.

import datetime
date = datetime.datetime.strptime(datestr, "%mm/%dd/%Y %I:%M %p")

Remember - you generally should not find yourself parsing HTML or XML with regexes...

edited Sep 25 '15 at 16:24

jfs

399,953
195
994
1,670

answered Sep 25 '15 at 08:04

stett

1,351
1
11
24

"Never Say Never Again". If you want to parse 1B of letters, it's better to write you own tool to parse html instead of using `BeatifulSoup`, because Soup is a tool for html analyze. And it does a lot of work, that you (probably) don't need. Also, Soup are not memory efficient. – Jimilian Sep 25 '15 at 08:33
haha okay yes you're right... never say never. I just was thinking about this famous question (and top answer): http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – stett Sep 25 '15 at 08:36
Now it's much better ;) Here is your +1 :) Btw, look into second answer from that topic :) – Jimilian Sep 25 '15 at 08:36
@Jimilian: no, regex is even less of an answer with larger masses of XML. There are fast tools to parse XML that are not BeautifulSoup. Doesn't mean regex is the only alternative. – mike3996 Sep 25 '15 at 08:39
In general, there are XML parsers that build up a useable DOM presentation (like BS) and then there are parsers that read a stream of XML into a tokenized stream, usually only used when the input XML doesn't fit into the memory. – mike3996 Sep 25 '15 at 09:01
@stett, Thank you so much for your answer. I use your answered code but it give me error *AttributeError: 'NoneType' object has no attribute 'find_parent'* – Bhavesh Odedra Sep 25 '15 at 13:21
@Odedra: the label text was impresice. If you don't need the exact match; you could use `text=re.compile('Delivered On')` instead. – jfs Sep 25 '15 at 16:25
`strptime()` also fails. You could use `.strptime(datestr.replace('A.M.', 'am').replace('P.M.', 'pm'), "%m/%d/%Y %I:%M %p")` instead. – jfs Sep 25 '15 at 16:27

score 1 · Accepted Answer · edited Sep 25 '15 at 13:30

Try this code:

import re

text = '''<p>Dear Customer,</p>
          <p>This notice serves as proof of delivery for the shipment listed below.</p>
          <dl class="outHozFixed clearfix"><label>Weight:</label></dt>
          <dd>18.00 lbs</dd>
          <dt><label>Shipped&#047;Billed On:</label></dt>
          <dd>09/11/2015</dd>
          <dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
          <dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
          <dt><label>Left At:</label></dt>
          <dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'''

re.findall(r'<dt><label>Delivered On:<\/label><\/dt><dd>([0-9\.\/\s:APM]+)', text)

OUTPUT:

['09/14/2015 11:07 A.M.']

score 1 · Answer 3 · answered Sep 25 '15 at 08:08

Based on that output only, I would use re and re.search. Create a regex for finding a date with time, like this:

import re

output = '''<p>Dear Customer,</p>
            <p>This notice serves as proof of delivery for the shipment listed below.</p>
            <dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd>
            <dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd>
            <dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
            <dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
            <dt><label>Left At:</label></dt>
            <dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'''

pattern = '\d{2}/\d{2}/\d{4} \d{1,2}:\d{2} [A|P]\.M\.'

result = re.search(pattern, text, re.MULTILINE).group(0)

Thank you so much. Your code works fine but it's fail when Delivered On: is empty. Here is error. AttributeError: 'NoneType' object has no attribute 'group' — Bhavesh Odedra, Sep 25 '15 at 13:29

Jimilian · Answer 4 · 2015-09-25T14:21:51.110

1

If you don't like regexp and third-part libraries, you always can use old-school hardcoded one-line solution:

import datetime

text_date = [item.strip() for item in input_text.split('\n') if "Delivered On:" in item][0][41:-5]
datetime.datetime.strptime(text_date.replace(".",""), "%m/%d/%Y %I:%M %p")

For one line case:

start_index = input_text.index("Delivered On:")+len("Delivered On:</label></dt><dd>")
stop_index = start_index + 21
text_date = input_text[start_index:stop_index]

Because any solution for your question will be a different type of hardcode :(

edited Sep 25 '15 at 14:21

answered Sep 25 '15 at 08:29

Jimilian

3,859
30
33

thank you for your answer. But this code will not fetch the date. – Bhavesh Odedra Sep 25 '15 at 13:25
if you test @Alexandr Faizullin code, I get what i want. But in your case I didn't get what I want. – Bhavesh Odedra Sep 25 '15 at 14:01
It sounds fair enough, but what output you get? Can you show it? It's just interesting for me. – Jimilian Sep 25 '15 at 14:10
yes i will. JFI, input text will be in one line not a *"\n"* might be that will be issue. You are testing with line by line and i get response from the server in one line. – Bhavesh Odedra Sep 25 '15 at 14:16
@Odedra, yeap, for one line case solution should be different :) – Jimilian Sep 25 '15 at 14:22
thank you so much for your efforts and valuable time given in it. Thanks – Bhavesh Odedra Sep 25 '15 at 14:35

Vineesh · Answer 5 · 2015-09-25T14:31:28.270

1

Try this code:

import re
a = """<p>Dear Customer,</p><p>This notice serves as proof of delivery for the shipment listed below.</p><dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd><dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd><dt><label>Delivered On:</label></dt><dd>12/4/2015 11:07 A.M.</dd><dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt><dt><label>Left At:</label></dt><dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>"""
data = re.search('Delivered On:</label></dt><dd>(.*)$',a)
if data and data.group(1)[:1].isdigit(): 
    data.group(1)[:20]

edited Sep 25 '15 at 14:31

answered Sep 25 '15 at 13:40

Vineesh

253
2
7

I added but it's give me output this => '
– Bhavesh Odedra Sep 25 '15 at 13:46
@Odedra, I have added one more check in the answer part. Can you please try with this – Vineesh Sep 25 '15 at 14:32

how to take date value from string in python?

5 Answers5