183

I need a database of every single valid word in English. I checked the /usr/share/dict/words file, it contains less than 100k words. Wikipedia says English has 475k words. Where do I get the complete list (American spelling)?

Also, is there a single website that gives out words for other languages too, including Asian and European ones?

Edit: Forgot to add, I do not need names etc., just valid English words.

pigrammer
  • 2,603
  • 1
  • 11
  • 24
  • 20
    My `/usr/share/dict/words` has 479829 words, so maybe there is some variation here (and might be suitable for others). – marshall.ward Sep 18 '13 at 00:25
  • 8
    `wc -l /usr/share/dict/words` on Mac is **235,886 words** (July 2014 - OSX Mavericks 10.9.4) – nelsonic Jul 13 '14 at 22:18
  • 4
    http://www.freescrabbledictionary.com/english-word-list/ – Cesar Bielich Aug 03 '15 at 22:16
  • 2
    you can get a worlist here http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html .. look for the WORDLIST link on the right – kofifus Jul 18 '16 at 07:35
  • 3
    Just in case anyone is still looking for this, I just got a good free Scrabble dictionary from https://www.wordgamedictionary.com/. – Chris Rae Jan 10 '19 at 23:04
  • the resource @james.garriss posted (thx!) is no longer there. Looks like the repo lives tho: https://github.com/dwyl/english-words – user2901351 Apr 09 '21 at 20:38

6 Answers6

94

WordNet database might be helpful. I once worked on a Firefox add-on which deals with words and all kinds of simple to complicated associations between them and stuff. Looks like WordNet will be very much useful to you.

Here it is in MySQL format. And this one (web-archived link) uses Wordnet v3.0 data, rather than the older Wordnet 2.0 data.

Graham
  • 3,153
  • 3
  • 16
  • 31
user266803
  • 1,088
  • 1
  • 8
  • 9
42

You can find what you need on infochimps.org.

They have a list of 350,000 simple (ie non-compound) words available for free download.

Word List - 350,000+ Simple English Words

Regarding other languages, you might want to poke around on Wiktionary. Here is a link to all the database backups - the information isnt organized so likely but if they have a language, you can download the data in SQL format.

Community
  • 1
  • 1
danben
  • 80,905
  • 18
  • 123
  • 145
  • 6
    The download link has changed - http://www.infochimps.com/datasets/word-list-350000-simple-english-words-excel-readable – Chris Rae Jan 13 '12 at 15:01
  • 48
    Annoyingly the infochimps file is **.xls** (an excel file with the words split across 6 worksheets!) ... I've extracted all **354986 words** into a **txt file**: https://github.com/nelsonic/english-words – nelsonic Jul 13 '14 at 22:33
  • 1
    @nelsonic thanks a lot ,the infochimps link is 404 –  Dec 20 '14 at 10:50
  • 1
    @ChrisRae both links not working – garg10may May 12 '16 at 15:24
  • 5
    seems like they include words with misspellings, like _tecnology_ - presumably because they collect everything that shows up on the web. so it's good for password cracking / validation, but not good for applications that require real words (like spell checkers, etc.). – max Jun 07 '16 at 02:14
13

I do not see http://wordlist.sourceforge.net/ mentioned here, but that is where I would start if I were looking for something like this (and I was, when I stumbled over this question).

If you cannot find what you want there, and what you want is a list of english words, then you should probably spend some extra time describing how to recognize what it is that you want.

rdm
  • 658
  • 5
  • 16
  • 1
    I was hopeful that these broader lists would contain words with punctuation, like "C++" or "C#", but couldn't find any. So if that's what you're after you can short-circuit you can skip this one (and the narrower lists in other answers). – hobs Apr 27 '16 at 18:53
  • @hobs Technically, "C++" is a C word (more likely from the B language), and not necessarily an English language word. It is actually defined as legal C grammar. True, English has borrowed it, but it isn't from a natural language. – SO_fix_the_vote_sorting_bug Feb 14 '22 at 14:52
  • @SO_fix_the_vote_sorting_bug I don't think that's true. English is a dynamic, informal language. There is no rigid, logical definition or category theory math expression or software program you can write to identify what is and is not an English word. You must create a statistical model for what *you* want in your list of words for your *application*. I think NL is a superset of all languages (formal and informal) because humans use them all to communicate with each other. – hobs Feb 20 '22 at 01:41
11

There's no such thing as a "complete" list. Different people have different ways of measuring -- for example, they might include slang, neologisms, multi-word phrases, offensive terms, foreign words, verb conjugations, and so on. Some people have even counted a million words! So you'll have to decide what you want in a word list.

JW.
  • 50,691
  • 36
  • 115
  • 143
  • 3
    Thanks for that link. A very enlightening read on just how many words there are in the English language, and the futility of trying to arrive at a definitive count of them. For a more concise and up-to-date read, there's also this: https://en.oxforddictionaries.com/explore/language-questions/how-many-words-are-there-in-the-english-language. – Hashim Aziz Mar 02 '17 at 19:30
  • 1
    @HashimAziz The issue is probably that there isn't an objective definition of "English," as it's just a consensus type of thing. One could make a list of "every utterance ever uttered by an English speaker while speaking English." But then you'd have to define "speaking English" and "English speaker." – SO_fix_the_vote_sorting_bug Feb 14 '22 at 14:46
4

You may check *spell en-GB dictionary used by Mozilla, OpenOffice, plenty of other software.

mloskot
  • 37,086
  • 11
  • 109
  • 136
  • link on mozilla http://en-gb.pyxidium.co.uk/dictionary/en_GB.zip says Server not found, any update ? thanks –  Dec 20 '14 at 11:01
  • @AMB Thx, I updated the link to point to alternative source of the dictionary at http://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice – mloskot Dec 21 '14 at 10:51
  • And now the new link is 404, @mloskot. – james.garriss Jul 17 '15 at 17:20
  • @james.garriss I'm afraid, the whole http://extensions.openoffice.org site seems to be down. – mloskot Jul 21 '15 at 11:03
  • en-gb.pyxidium.co.uk/dictionary/en_GB.zip can be found here: https://web.archive.org/web/20120210204607/http://en-gb.pyxidium.co.uk/dictionary/en_GB.zip (web archive) – nikssa23 Jul 19 '21 at 23:52
3

You didn't say what you needed this list for. If something used as a blacklist for password checks is enough cracklib might be good for you. It contains over 1.5M words.

Benjamin Bannier
  • 55,163
  • 11
  • 60
  • 80
  • 2
    no, not for blacklist. I am doing some sort of word game/graph. –  Feb 06 '10 at 15:49
  • 1
    This has a lot of "junk words", however I'm still very grateful that you put this here - it's perfect when searching for specific words that the other dictionaries don't have (e.g. firetruck) – kangalio Oct 06 '19 at 15:52
  • @Benjamin Bannier How to extract words from this to something like a txt file? – Shahood ul Hassan Nov 23 '22 at 03:51