0

I am trying to grab the contents of the response text i.e. the date. However, if it is anything else other than a date, it should not grab the content. Can someone help me here please.. My current regex is : 'Renewal/Expiration Date:[^\d]([\d/])'

    <div class="textbkStyle">Renewal/Expiration Date:
        <div class="responseText">


                01/01/2019

        </div>
    </div>

The problem being if the xml looks like this

    <div class="textbkStyle">Renewal/Expiration Date:
        <div class="responseText">


                NOT AVAILABLE

        </div>
    </div>

It goes and skips the NOT AVAILABLE text and grabs another consecutive date in the page which matches the format. Also suggestions for resources to get better at regex are also appreciated.

tm_2906
  • 51
  • 6

2 Answers2

1

Regex is not the best tool for this. I would use html parser. Example BeautifulSoup: pip install beautifulsoup4 and do

from bs4 import BeautifulSoup

raw_1 = '''
<div class="textbkStyle">Renewal/Expiration Date:
        <div class="responseText">

                01/01/2019

        </div>
    </div>
'''

raw_2 = '''
div class="textbkStyle">Renewal/Expiration Date:
        <div class="responseText">


                NOT AVAILABLE

        </div>
    </div>
'''

soup = BeautifulSoup(raw_1, 'html.parser')

print(soup.find('div',{'class':'responseText'}).getText(strip=True))

soup_2 = BeautifulSoup(raw_2, 'html.parser')

print(soup_2.find('div',{'class':'responseText'}).getText(strip=True))

Or a function:


def get_response_text(raw):

    soup = BeautifulSoup(raw, 'html.parser')

    tag = soup.find('div',{'class':'responseText'})

    return tag.getText(strip=True)



print(get_response_text(raw_1))

print(get_response_text(raw_2))

Prayson W. Daniel
  • 14,191
  • 4
  • 51
  • 57
0

Although you shouldn't, here is how you can:

<div class=\"textbkStyle\">Renewal/Expiration Date:\s*<div class=\"responseText\">\s*(\d{2}/\d{2}/\d{4})\s*</div>\s*</div>

And your date will be available in \1

https://regex101.com/r/7Yn7zk/1

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77