I want to capture texts from the below link and save it. http://forecast.weather.gov/product.php?site=NWS&issuedby=FWD&product=RR5&format=CI&version=44&glossary=0
I need to save only the texts after .A, so I do not need the other texts in the page. Moreover, there are 50 different links at top of the page that I want to get all of the data from all of them.
I have written the below code but it returns nothing, how can specifically get part that I need?
import urllib
import re
htmlfile=urllib.urlopen("http://forecast.weather.gov/product.php?site=NWS&issuedby=FWD&product=RR5&format=CI&version=1&glossary=0")
htmltext=htmlfile.read()
regex='<pre class="glossaryProduct">(.+?)</pre>'
pattern=re.compile(regex)
out=re.findall(pattern, htmltext)
print (out)
I also used the following that returns all the content of the page:
import urllib
file1 = urllib.urlopen('http://forecast.weather.gov/product.php?site=NWS&issuedby=FWD&product=RR5&format=txt&version=1&glossary=0')
s1 = file1.read()
print(s1)
Can you help me to do so?