3

I want to know if a word is in the dictionary.

Here is what I am trying.

import requests

def word_in_dictionary(word):
    response = requests.get('https://en.wiktionary.org/wiki/'+word)
    return response.status_code==200

print(word_in_dictionary('potato')) # True
print(word_in_dictionary('nobblebog')) # False

But unfortunately the dictionary contains a lot of words that are not English and I don't want to match those.

print(word_in_dictionary('bardzo')) # WANT THIS TO BE FALSE

So I tried to look in the content.

def word_in_dictionary(word):
    response = requests.get('https://en.wiktionary.org/wiki/'+word)
    return response.status_code==200 and 'English' in response.content.decode()

But I am still getting True. It is finding "English" somewhere in the page source even though the rendered page doesn't have it (nothing when I search with ctrl-F in the browser).

How can I make it only return True if it is actually listed as having a meaning in English?

vahdet
  • 6,357
  • 9
  • 51
  • 106
upyop
  • 33
  • 3
  • If the word "English" is in some consistent spot in the page, that would probably be a better way to check than looking at the entire page (prone to errors, as you found out). – Mateen Ulhaq Aug 28 '20 at 11:07
  • Try in your browser view page source then search for English - it’s probably there. Or save the decoded content to file and check that in an editor like Notepad++ – DisappointedByUnaccountableMod Aug 28 '20 at 11:07
  • 1
    Possible duplicate of [How to retrieve Wiktionary word content?](https://stackoverflow.com/questions/2770547/how-to-retrieve-wiktionary-word-content) – mkrieger1 Aug 28 '20 at 11:09
  • Use something like BeautifulSoup to parse the HTML, then check whether there's the word "English" in an `

    ` tag?

    – Jiří Baum Aug 28 '20 at 11:10
  • As seen on the duplicate question, using Wiktionary for this purpose is problematic. Maybe you can download a list of English words? For example: https://github.com/dwyl/english-words/ – Thomas Aug 28 '20 at 11:12

1 Answers1

5

Looking at the HTML code, if the word is english, there's tag with id="English". You can try this code:

import requests
from bs4 import BeautifulSoup


def word_in_dictionary(word):
    response = requests.get('https://en.wiktionary.org/wiki/'+word)
    return response.status_code==200 and bool(BeautifulSoup(response.content, 'html.parser').select_one('#English'))

print(word_in_dictionary('potato')) # True
print(word_in_dictionary('nobblebog')) # False
print(word_in_dictionary('bardzo')) # False

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91