I'm having trouble with web scraping to Python

Question

I'm very new to coding and I've tried to write a code that imports the current price of litecoin from coinmarketcap. However, I can't get it to work, it prints and empty list.

import urllib
import re

htmlfile = urllib.urlopen('https://coinmarketcap.com/currencies/litecoin/')

htmltext = htmlfile.read()

regex = 'span class="text-large2" data-currency-value="">$304.08</span>'

pattern = re.compile(regex)

price = re.findall(pattern, htmltext)

print(price)

Out comes "[]" . The problem is probably minor, but I'm very appreciative for the help.

I did use single quotation marks in my code, but stack overflow converted "span class="text-large2" data-currency-value="">$304.08" to $304.08 straight away. — User2245, Dec 15 '17 at 23:38
Regular expressions are generally not the best tool for processing HTML. I suggest looking at something like [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/). That aside, your `regex` pattern probably doesn't do what you think it should. Review the [documentation](https://docs.python.org/3.4/library/re.html). — Galen, Dec 15 '17 at 23:49

Galen · Answer 1 · 2017-12-16T00:24:26.513

Regular expressions are generally not the best tool for processing HTML. I suggest looking at something like BeautifulSoup.

For example:

import urllib
import bs4

f = urllib.urlopen("https://coinmarketcap.com/currencies/litecoin/")
soup = bs4.BeautifulSoup(f)
print(soup.find("", {"data-currency-value": True}).text)

This currently prints "299.97".

This probably does not perform as well as using a re for this simple case. However, see Using regular expressions to parse HTML: why not?

score 0 · Answer 2 · answered Dec 15 '17 at 23:52

You need to change your RegEx and add a group in parenthesis to capture the value.

Try to match something like: <span class="text-large2" data-currency-value>300.59</span>, you need this RegEx:

regex = 'span class="text-large2" data-currency-value>(.*?)</span>'

The (.*?) group is used to catch the number.

You get:

['300.59']

I'm having trouble with web scraping to Python

2 Answers2