2

I have a class that needs a list of words to be excluded from a string:

class Cleaner():
     def __init__(self, remove_words=None):
         self.remove_words = remove_words

     def clean(self, line):
         return u' '.join[word for word in line not in self.remove_words]

and on the main file I need to read the words to be removed from the lines:

if __name__ == "__main__":

     with open('remove_words') as r:
         words = r.read().splitlines()

     cleaning = Cleaner(words)

     with open('mylines') as f:
          lines = f.read()
          for line in lines:
               print cleaning.clean(line)

So I need to open the remove_words file before creating the Clean class. But alas, I need to open several files with words to be removed and the code get messy fast. So I added a class to set the removable words on the Clean class:

 class Cleaner():
     def __init__(self, remove_words=None):
         self.remove_words = remove_words

     def set_remove_words(self, words):
         self.remove_words = words  

     def clean(self, line):
         return u' '.join[word for word in line not in self.remove_words]

so now the main code would look like

if __name__ == "__main__":

     with open('remove_words') as r:
         words = r.read().splitlines()
     # after lots of these open files...
     with open('remove_more_words') as r:
         more_words = r.read().splitlines()

     cleaning = Cleaner()
     all_removable_words = words + more_worlds
     cleaning.set_remove_words(all_removable_words)

     with open('mylines') as f:
          lines = f.read()
          for line in lines:
               print cleaning.clean(line)

but then again things can get very messy. There are cases when I will have to open and pass just one list of removable words, sometimes will be several. What would be a "pythonic" solution for this? Would move the filenames with removable words be passed to the constructor and build the lists there be more "pythonic" and less error prone? Where should the exceptions be handled?

Ivan
  • 19,560
  • 31
  • 97
  • 141
  • so is your class `Cleaner` or `Clean`? you seem to be implementing a `Cleaner` class but initiating `Clean` objects.... – R Nar Nov 25 '15 at 00:29
  • and i think if you want to just add more words, you can use `list.extend(otherlist)` and define a `add_more_words` function that takes in a list and calls something like `remove_words.extend(more_words)`. This would mean having a default value of an empty list instead of `None` so that it can work for if you initiated the object with no list – R Nar Nov 25 '15 at 00:32
  • 1
    additionally, I would use a `set()` instead of a list since they are more efficient for checking if a value is in the set. – R Nar Nov 25 '15 at 00:32

2 Answers2

1

First off, good job on keeping your file I/O out of your class. I like how you adhere to Uncle Bob's clean architecture principle. You should absolutely not move it into the constructor, since that would couple your domain-rules code to the open function, and therefore make it less reusable.

I would make use of list comprehensions and generators to be Pythonic.

if __name__ == "__main__":

     bad_word_sources = ['remove_words',...,'remove_more_words']
     bad_word_files = (open(source) for source in bad_word_sources)       
     bad_words = [word for word in chain(bad_word_files)]

     cleaning = Cleaner(bad_words)

This works because the open() function provides an __iter__ implementation that will work similar to

[line for line in file.readlines()]

When the open object is exhausted it will take care of closing itself[reference needed].

I'm not sure which kind of exceptions you would like to handle, could you be more specific on this?

Also note that set_words methods are considered unpythonic. Just set the attribute directly if you have to.


On a side note this class only has 2 methods, one of which is __init__:

 class Cleaner():
     def __init__(self, remove_words=None):
         self.remove_words = remove_words

     def clean(self, line):
         return u' '.join[word for word in line not in self.remove_words]

The Pythonic way of making this reusable is to ditch the class and put it in a module:

cleaner.py

def clean(line, bad_words):
    return u' '.join(word for word in line if line not in self.bad_words)

You can then use it like:

from cleaner import clean

instead of:

from cleaner import Cleaner
mycleaner = Cleaner(bad_words)
mycleaner.clean(line)

which is really confusing.

Community
  • 1
  • 1
Sebastian Wozny
  • 16,943
  • 7
  • 52
  • 69
  • Thanks! I like the approach with the generators for reading the files. The problem with the module is that 1. this class has more methods and 2. I'm trying hard to keep the methods with just one input because I'll need to apply these methods with the `applymap` method from pandas and it's much easier if there is only one parameter. – Ivan Nov 25 '15 at 09:56
  • BTW, awesome link to Stop Writing Classes. – Ivan Nov 25 '15 at 10:11
  • You can use `functools.partial` for this :)https://docs.python.org/2/library/functools.html#functools.partial – Sebastian Wozny Nov 25 '15 at 10:30
1

There are cases when I will have to open and pass just one list of removable words, sometimes will be several. What would be a "pythonic" solution for this?

I think what you need is dynamic parameters/arguments.

class Cleaner():
     def __init__(self, *remove_words):
         self.remove_words = []
         [self.remove_words.extend(one) for one in remove_words]

     def clean(self, line):
         return u' '.join[word for word in line not in self.remove_words]

and there is another way, **argw, for a keyword arguments which you can specify a name to the argument dict.

holsety
  • 323
  • 1
  • 2
  • 13