0

I'm scraping values from web page html that looks like this:

location=1">MAIN BUILDING</a> : -25.49<br />

I'm successfully using Python's partition twice: once to save everything after the ID string MAIN BUILDING</a> : and then again to save the part before <br />

Using partition twice doesn't seem too horrible. But is there a better way to do this? Seems like there ought to be a way to extract a string sandwiched between between two other strings in one step rather than two.

user3217032
  • 79
  • 2
  • 5
  • I suspect that the feature you want is a "regular expression", using the "capture" feature. Those should be enough for you to find an example on line, and avoid getting any answer disqualified for being a "duplicate". – Prune May 02 '17 at 17:24
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – unddoch May 02 '17 at 17:24
  • 8
    [Get a parser.](https://www.crummy.com/software/BeautifulSoup/) It'll be much more effective and reliable. – user2357112 May 02 '17 at 17:28

1 Answers1

1

You can use str.index with Python string indexing for a one-liner:

>>> s = 'location=1">MAIN BUILDING</a> : -25.49<br />'
>>> begin = 'MAIN BUILDING</a> :'
>>> end = '<br />'
>>> s[s.index(begin):s.index(end)]
'MAIN BUILDING</a> : -25.49'

This assumes many things:

  1. The exact text that you included will delimit the string in question.
  2. Both pieces of text occur exactly once.
brianpck
  • 8,084
  • 1
  • 22
  • 33