1

I'm trying to use this word list I found for a somewhat ambitious hangman game for a discord bot, and it recommended to use the .json version of a file as a dictionary, if I were using Python, which I am. Only problem, it takes forever for it to go through(interpret?), presumably because it has 370102 lines, and considering this is going to be run on a raspberry pi, this probably isn't going to work out very well.

What would be the best way to go about this? I'm new to python and programming in general, so I'm not quite sure how to do so. Would it be faster if I were to use it in C? Maybe I could use an array somehow?

It doesn't have to be in a dictionary, it's just that the file was provided like that.

Shidouuu
  • 368
  • 4
  • 14
  • 2
    You only have to pay that cost when your script starts, once it's loaded you can access it quickly. Also, what is "forever"? – Jared Smith Dec 16 '20 at 19:33
  • 2
    Why does it need to be a dictionary? What are you associating with each dictionary item? – Random Davis Dec 16 '20 at 19:33
  • @RandomDavis It would need fast key lookup, [a set](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset), which is often implemented with a dictionary. – Schwern Dec 16 '20 at 19:34
  • @Schwern why does a key lookup need to happen? I'd think a list would be faster, since you just need a flat list of words, and randomly grab any element from the list with each new round. If you need to store any information with a word, I'd maybe create a new dictionary to hold it, but only for words that have been played, rather than the whole list of all of them. – Random Davis Dec 16 '20 at 19:35
  • @Schwern Python *has* a `set` type; there's no need to simulate one using a `dict`. – chepner Dec 16 '20 at 19:35
  • @RandomDavis Yeah, I guess you only need one word. I've added the one-pass random line algorithm to my answer. – Schwern Dec 16 '20 at 19:36
  • @chepner Yes, that doesn't materially change the problem. – Schwern Dec 16 '20 at 19:37
  • Is it any faster if you save and load it with pickle? – gilch Dec 16 '20 at 19:38
  • @RandomDavis Sorry, I should have specified that it doesn't have to be a dictionary, I'll edit that in now. I probably could use a list now that I think about it; just use find and replace to replaces the columnns with commas. – Shidouuu Dec 16 '20 at 19:45
  • @JaredSmith Sorry, should've mentioned that the owner of the bot I'm modifying would probably get mad at me if I included it in the main file, so it needs to be imported every time the function is called. – Shidouuu Dec 16 '20 at 19:47
  • @Shidouuu yeah that's important info, should probably edit that in to the question. – Jared Smith Dec 16 '20 at 19:52

2 Answers2

2

For something as basic as dictionary lookup, put the words into a dbm file then read from that. This effectively works as an on-disk dictionary letting you look up keys and their values quickly without having to load the whole thing into memory.

import dbm

with dbm.open('cache', 'r') as db:
  ...use db as a dictionary...

For anything more complicated use SQLite, a stand-alone SQL database.


However, as RandomDavis points out, you don't need to load the whole word list. You just need to pick one word at random per game. This can be done in a single read of the file.

The file can be compressed and still read line by line.

See if it's fast enough for your purposes. If it isn't, perhaps you could run a thread which loads the next word in the background while they're working on the first one.

Schwern
  • 153,029
  • 25
  • 195
  • 336
1

It's not necessary to use a dictionary in this case. You're just randomly picking a word from a list. If you use words_alpha.txt from what github repo and import that as a list, it's super fast:

import random

with open('words_alpha.txt') as words_file:
    words = words_file.read().splitlines()

print(random.choice(words))

The above takes less than half a second on my machine. And the random.choice() part is blazing fast, any slowness would be the file reading only.

Random Davis
  • 6,662
  • 4
  • 14
  • 24