0

I need to parse the Device Time (i.e. 2012-01-17 13:12:09) in below text by using python. Could you please tell me how I can do this using the standard regular expression library in python? Thanks.

  <html><head><style type="text/css">h1 {color:blue;}h2 {color:red;}</style>
  <h1>Device #1   Root Content</h1><h2>Device Addr: 127.0.0.1:8080</h1>
  <h2>Device Time: 2012-01-17 13:12:09</h2></body></html>
F. Aydemir
  • 2,665
  • 5
  • 40
  • 60

4 Answers4

2

Just to add

import re
pattern = re.compile(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})')
first_match = pattern.search(html)
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
Shadow
  • 6,161
  • 3
  • 20
  • 14
  • 1
    Although it happens to work here, it's better to use a raw string for regexes. I've edited your answer accordingly. If you get used to this convention, you can avoid a lot of grief later (for example, when your regex contains `\b`). – Tim Pietzcker Jan 17 '12 at 14:20
1

You need this regex.

/Device Time: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})/

or this,

/Device Time: (\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d)/

Use this regular expression with global switch on.

Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187
1

Try this regex

Device Time: ([^<]+)

this will just return the remaining rest after the words "Device Time: " till the next html tag starts. As shown in an other answer you could also search for a more specific format of this date time.

In general it's considered bad practice to parse html files with regex. However you're example is more like parsing some normal text which happens to be part of html file... In this case that's kind of fine... ;-)

bw_üezi
  • 4,483
  • 4
  • 23
  • 41
1

Maybe like this: import re

str = """ Your HTML String here"""

pattern = re.compile(r"""Device Time:([ \d\-:]*)""")
s = pattern.search(str)

time = s.group(1)
xueyumusic
  • 229
  • 2
  • 9
  • How about parsing the time excluding the date? (e.g. 14:00:51) Thanks. – F. Aydemir Jan 17 '12 at 13:02
  • may be add: day_time = time.strip().split(' ')[1] – xueyumusic Jan 17 '12 at 13:14
  • The following does everything: pattern = re.compile(r"""Device Time:([ \d\-:]*)""") s = pattern.search(str) time = (s.group(1)).strip() print time pattern = re.compile('(\d{4}-\d{2}-\d{2})') s = pattern.search(time) date_ = s.group(1) print date_ pattern = re.compile('(\d{2}:\d{2}:\d{2})') s = pattern.search(time) hour = s.group(1) print hour – F. Aydemir Jan 17 '12 at 13:16