0

Context:

I've got this composition (new vocabulary word for me) of an OneHotEncoder object:

class CharEncoder:

    characters = cn.ALL_LETTERS_ARRAY

    def __init__(self):
        self.encoder = OneHotEncoder(sparse=False).fit(self.characters.reshape(-1, 1))
        self.categories = self.encoder.categories_[0].tolist()

    def transform(self, word):
        word = np.array(list(word)).reshape(-1, 1)
        word_vect = self.encoder.transform(word)
        return word_vect

    def inverse_transform(self, X):
        word_arr = self.encoder.inverse_transform(X).reshape(-1,)
        return ''.join(word_arr)

As you can see it has a class attribute characters which is essentially an array of all the ASCII characters plus some punctuation.

I want to make this CharEncoder class useful for more than just ASCII. Maybe someone else would really like to use a different character set, and I want to allow them to do so. Or maybe they want to encode entire words instead of individual letters... who knows!?

My problem:

I feel like there are so many design choices here that could make this code re-usable for a slightly different task. I feel overwhelmed.

  1. Do I make the character set a class attribute or an instance attribute?
  2. Do I write getters and setters for the character set?
  3. Do I instead write some parent class, and sub-classes for different character sets.
  4. Or do I make users pass their own OneHotEncoder object to my class, and not worry about it myself?

My question:

What are some considerations that might help guide my design choice here?

rocksNwaves
  • 5,331
  • 4
  • 38
  • 77
  • As far as I can tell, the only "configurable" element here is `characters`, correct? – DeepSpace Feb 28 '21 at 18:50
  • @DeepSpace Yes, definitely. And that one configuration spawns so many design choice questions lol. Further context is that I have another class where one attribute is an `CharEncoder` instance. That complicates everything even more! – rocksNwaves Feb 28 '21 at 18:51

1 Answers1

2

I'd just make characters an instance attribute with a default value.

class CharEncoder:
    def __init__(self, characters=cn.ALL_LETTERS_ARRAY):
        self.characters = characters
        self.encoder = OneHotEncoder(sparse=False).fit(self.characters.reshape(-1, 1))
        self.categories = self.encoder.categories_[0].tolist()

Caution: If cn.ALL_LETTERS_ARRAY is mutable (ie a Python list or a numpy array), use None as a sentinel value:

def __init__(self, characters=None):
    self.characters = characters or cn.ALL_LETTERS_ARRAY
    # a shorter version for
    # if characters is None:
    #   self.characters = cn.ALL_LETTERS_ARRAY
    # else:
    #   self.characters = characters
    # with a small caveat that self.characters can't be set to
    # an empty string/list/array/dict because these evaluate to False

Usage:

default_chars_encoder = CharEncoder() # using the default cn.ALL_LETTERS_ARRAY 
custom_chars_encoder = CharEncoder(CUSTOM_CHARCTERS_SET) # using CUSTOM_CHARCTERS_SET
DeepSpace
  • 78,697
  • 11
  • 109
  • 154