Refer to the comments in your question to see why using requests
might be a bad idea to count the frequency of a word in the "visible spectrum" of a webpage (what you actually see in the browser).
If you want to go about this with selenium
, you could try:
from selenium import webdriver
url = 'https://www.gov.uk/government/publications/specialist-quality-mark-tender-2016'
driver = webdriver.Chrome(chromedriver_location)
driver.get(url)
body = driver.find_element_by_tag_name('body')
fr = []
wanted = ['tender', '2020', 'date']
for word in wanted:
freq = body.text.lower().count(word) # .lower() to account for count's case sensitive behaviour
dic = {'phrase': word, 'frequency': freq}
fr.append(dic)
print('Frequency of', word, 'is:', freq)
which gave me the same results that a CTRL + F
does.
You can test BeautifulSoup
too (which you're importing by the way) by modifying your code a little bit:
import requests
from bs4 import BeautifulSoup
url = 'https://www.gov.uk/government/publications/specialist-quality-mark-tender-2016'
fr = []
wanted = ['tender','2020','date']
a = requests.get(url).text
soup = BeautifulSoup(a, 'html.parser')
for word in wanted:
freq = soup.get_text().lower().count(word)
dic = {'phrase': word, 'frequency': freq}
fr.append(dic)
print('Frequency of', word, 'is:', freq)
That gave me the same results, except for the word tender
, which according to BeautifulSoup
appears 12 times, and not 11. Test them out for yourself and see what suits you.