1

Here is a scenario:

  1. I get a list of sentences from a database (MySQL).
  2. I read a file into a dictionary.
  3. Process each sentence based on the dictionary.

My initial (horrible) approach was:

  1. Had two modules.
  2. ModuleA got the sentences from database and called functionB per sentence.
  3. FunctionB which is in ModuleB populates the dictionary from file. Each time. Then it processes the sentence.

This was done so because initially I was testing ModuleB separately. Later, when FunctionB was being called repeatedly I had to fix this. My solution was to move the population of dictionary from file to a separate function and set a global dictionary. Code goes like:

ModuleA

import ModuleB
def main():
    sentences = getSentencesFromDB()
    for (sentence in sentences):
        functionB(sentence)

ModuleB

dictionary = makeDictionaryFromFile()

def functionB(sentence):
    for word in sentence.split():
        #process sentence using dictionary

Now my questions are:

  1. Is this the correct solution to my problem? That is, does this ensure file is read only once?
  2. Is there a better way to do this (maybe without using global).
  3. When is the dictionary populated? On first call to functionB? Or when importing ModuleB?
datah4ck3r
  • 1,174
  • 1
  • 8
  • 19

3 Answers3

2

That's a fine solution; you definitely don't want to read from the file more than once (assuming you don't expect the file's contents to change mid-run).

You could call it _dictionary to suggest it's a private field others shouldn't access.

Alternatively you could use a memoizing @decorator to hide this behavior from callers, or wrap _dictionary and makeDictionaryFromFile() in a class, as chthonicdaemon suggested.

But for a one-off both of those options are overkill, just read the file into a field and use it.

Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
2

To address your questions, in order.

  1. It's certainly a correct answer. The file will be read only once. There may be other correct answers as well. For a small program like this, your solution suffices.

  2. In order to work, the dictionary variable needs its value to persist from one call to functionB to the next, which rules out making dictionary local to that function. As alternatives, you could do as @dimo414 suggests and use a memoizing decorator, or you could define a class, and make the dictionary a property of the objects of that class (as @cthonicdaemon suggests). Another alternative would be to read the dictionary file into a database table, and then, in functionB query the database for the needed word. This will likely be more memory-efficient for very large dictionaries, at the cost of speed. Honestly, given the size of your program, all of these options seem like overkill. I'd suggest refactoring to one of them when it becomes clearer which one makes more sense, if that time ever comes. Also note that none of these are mutually-exclusive. You could do a combination of all three if you wanted.

  3. The dictionary, being a top-level variable in the module, is populated when the module is imported.

Ian McLaird
  • 5,507
  • 2
  • 22
  • 31
2

(1) The way you are doing it now ensures that the file is read only once. However, it makes it more tricky to re-read the file during the same execution of ModuleA without also executing all the other code in ModuleB

(2) I would consider using an object, something like this

class Parser(object):

    def __init__(self, filename='whatever the default filename is'):
        self.dictionary = makeDictionaryFromFile(filename)

    def functionB(self, sentence):
        for word in sentence.split():
            # process sentence using self.dictionary


def main():
    sentences = getSentencesFromDB()
    parser = Parser()
    for (sentence in sentences):
        parser.functionB(sentence)

(3) In your approach, dictionary is populated when ModuleB is imported.

chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66