I am trying to make the code only get every thing in between the <p>
tags. I haven't found a way yet.
I've tried to use a simple loop, and this porgramme you are suppose to enter an url and when you run it shows the plain text.
import urllib.request
import urllib.parse
import re
print("Enter the URL")
url = input()
#url = "https://en.wikipedia.org/wiki/Somalia"
values = {'s':'basic', 'submit':'search'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url,data)
resp = urllib.request.urlopen(req)
respData = resp.read()
#print(respData)
paragraphs = re.findall(r'<p>(.*?)</p>', str(respData))
for eachP in paragraphs:
print(eachP)
I have also tried to use BeutifulSoup and haven't even managed to import it.
elements in an HTML page using BeautifulSoup](https://stackoverflow.com/questions/10113702/how-to-find-all-text-inside-p-elements-in-an-html-page-using-beautifulsoup)
– Aaron_ab Jan 24 '19 at 07:55