0

I have the following substring in the string str(dList):

"addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>

I am trying to use re.search to pull out "MA" using this:

state = re.search(r'"addressRegion">\n\t\t\t\t\t\t\t\t\t(.+?)\n\t',str(dList))

however, that doesn't seem to work. I understand this is possibly because of the the way "/" is handled. I can't figure out how to deal with this.

krthkskmr
  • 461
  • 5
  • 22

2 Answers2

2

Regex is really not necessary

In [22]: str = '<span class="addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>'

In [23]: from bs4 import BeautifulSoup

In [24]: soup = BeautifulSoup(str, 'html.parser')

In [25]: soup.text
Out[25]: u'\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t'

In [26]: soup.text.strip()
Out[26]: u'MA'
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
1

update This is how you could do it if you really wanted to use regex, but I think @cricket_007's solution is the better approach.

All you need to do is to escape the backslash with another backslash. You can also get rid of the repetitions of '\t':

>>> s = '"addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>'
>>> re.search('.*\\n(\\t)+(.*?)\\n(\\t)+.*',s).group(2)
'MA'
yurib
  • 8,043
  • 3
  • 30
  • 55