-1

I am trying to get the date from

XXX='<div class="document-published-date">
                                July 14, 2018
                            </div>'

I was expecting that something like this would work

re.search('>(.*?)</div>',XXX)

but I am getting an empty result.

DanielTheRocketMan
  • 3,199
  • 5
  • 36
  • 65

2 Answers2

1

By default, dot does not match new line. You need to use (?s) flag to enable dot to match new line. Also you need to slightly correct your regex (remove ] at the end of your regex '>(.*?)]') like this,

(?s)>\s*(.*?)\s*</div>

Explanation:

  • (?s) --> Enables dot to match new lines
  • > --> Matches > character literally
  • \s* --> Consumes any whitespace before intended text capture
  • (.*?) --> Capture your intended data
  • \s* --> Capture any whitespace after intended data
  • </div> --> Matches this tag

Demo

Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36
-1

It's probably not a good idea to use regexes like this on a regular basis. You could instead use a module like htmldate to do extract the date of HTML documents (disclaimer: I'm the author), here is how it could work:

1. Install the package:

pip/pip3/pipenv (your choice) -U htmldate

2. Retrieve a web page, parse it and output the date:

from htmldate import find_date

find_date('http://blog.python.org/2016/12/python-360-is-now-available.html')
adbar
  • 93
  • 5