-1

I have this line:

<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>

I would like to extract the price: 45.66 contained between: data-asin-price=" and " data-asin-shipping

I found this code but doesn't work very well.

def extractSubstring(text, sub1, sub2):
  pos1 = text.lower().find(sub1) + len(sub1)
  pos2 = text.lower().find(sub2)
  if pos1 > pos2 and pos2 > 0:
    return text[pos1:pos2]
  elif pos2 > pos1 and pos1 > 0:
    return text[pos2:pos1]
  elif pos1 > 0:
    return text[pos1:]
  elif pos2 > 0:
    return text[pos2:]

result = soup.find_all(attrs={"data-asin-currency-code": "USD"})
priceLine='<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>'

sub1 = 'data-asin-price="'
sub2 = '" data-asin-shipping'

substring = extractSubstring(str(priceLine), sub1, sub2)
  • Use [regex](https://www.guru99.com/python-regular-expressions-complete-tutorial.html) – Richard Dunn Nov 28 '19 at 00:23
  • You can use `price = re.findall("\d+\.\d+",priceLine)` – AlexDotis Nov 28 '19 at 00:24
  • 1
    It's not clear what you aren't just using Beautiful Soup for this as well. It makes it very easy to [extract attributes](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes). Like: `soup.div['data-asin-price']` – Mark Nov 28 '19 at 00:24
  • I tried bs4 without success, I just found the way to extract it `result = re.search(sub1+'(.*)'+sub2, text)` so, if anyone wants to answer this.. – Martin Ocando Corleone Nov 28 '19 at 00:26
  • This may help: https://stackoverflow.com/questions/3368969/find-string-between-two-substrings – men6288 Nov 28 '19 at 00:35

1 Answers1

0

BeautifulSoup is the way to go

html = bs4.BeautifulSoup('<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>')

Then:

print(html.div['data-asin-price'])
45.66
Ke Zhu
  • 207
  • 1
  • 9