0

I'm working on getting the words between certain words in a string.

Find string between two substrings Referring to this article, I succeeded in catching words in the following way.

s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))

But in the sentence below it failed.

s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''


result = re.search('<span class="discount-rate">(.*)</span>', s)
print(result.group(1))

I'm trying to bring '4%'. Everything else succeeds, but I don't know why only this one fails. Help

anfwkdrn
  • 327
  • 1
  • 7
  • If you have a lot of information like this, you may want to look into my answer, rather than treating everything like strings. – BeRT2me Aug 13 '22 at 03:57

3 Answers3

1

Try this (mind the white spaces and new lines)

import re
s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''


result = re.search('<span class="discount-rate">\s*(.*)\s*</span>', s)
print(result.group(1))
Meh
  • 188
  • 1
  • 12
1

Use re.DOTALL flag for matching new lines:

result = re.search('<span class="discount-rate">(.*)</span>', s, re.DOTALL)

Documentation: https://docs.python.org/3/library/re.html

Daniel
  • 202
  • 1
  • 3
1

This is structured data, not just a string, so we can use a library like Beautiful Soup to help us simplify such tasks:

from bs4 import BeautifulSoup

s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''

soup = BeautifulSoup(s)
value = soup.find(class_='discount-rate').get_text(strip=True)
print(value)

# Output:
4%
BeRT2me
  • 12,699
  • 2
  • 13
  • 31