1

I'm currently on some heavy data analytics projects, and am trying to create a Python wrapper class to help streamline a lot of the mundane preprocessing steps involved when cleaning data, partitioning it into test / validation sets, standardizing it, etc. The idea ultimately is to transform raw data into easily consumable processed matrices for machine learning algorithms to input for training and testing purposes. Ideally, I'm working towards the point where

data = DataModel(AbstractDataModel)
processed_data = data.execute_pipeline(**kwargs)

So in many cases I'll start off with a self.df, which is a pandas dataframe object for my instance. But one method may be called standardize_data() and will ultimately return a standardized dataframe called self.std_df.

My IDE has been complaining heavily about me initializing variables outside of __init__. So to try to soothe PyCharm, I've been using the following code inside my constructor:

class AbstractDataModel(ABC):

    @abstractmethod
    def __init__(self, input_path, ...,  **kwargs):

        self.df_train, self.df_test, self.train_ID, self.test_ID, self.primary_key, ... (many more variables) = None, None, None, None, None, ...

Later on, these properties are being initialized and set. I'll admit that I'm coming from heavy-duty Java Spring projects, so I'm still used to verbosely declaring variables. Is there a more Pythonic way of declaring my instance properties here? I know I must be violating DRY with all the None values.

I've researched on SO, and came across this similar question, but the answer that is provided is more about setting instance variables through argv, so it isn't a direct solution in my context.

Yu Chen
  • 6,540
  • 6
  • 51
  • 86
  • @Alexander I've attempted this solution. However, Pycharm complains that it needs more values to unpack, and at runtime, I get a `TypeError: 'NoneType' object is not iterable` error. – Yu Chen Aug 06 '17 at 20:53
  • 1
    @Alexander, that would result in an error (NoneType not iterable), maybe you meant `var1 = var2 = ... = varN = None` – AChampion Aug 06 '17 at 20:53
  • 1
    My mistake, `self.var1 = self.var2 = ... = self.var_n = None` – Alexander Aug 06 '17 at 20:54
  • 1
    As an aside, when you have lots of variables inside a class instance, you may want to alphabetize them and declare each individually. I find that it helps me to maintain code. – Alexander Aug 06 '17 at 20:58
  • @Alexander that's a great idea. – Yu Chen Aug 06 '17 at 20:59

1 Answers1

2

Use chained assignment:

self.df_train = self.df_test = self.train_ID = self.test_ID = self.primary_key = ... = None

Or set up abstract properties that default to None (So you don't have to set them)

Artyer
  • 31,034
  • 3
  • 47
  • 75
  • This is such a good idea, and so simple! Definitely Pythonic. Speaking of the abstract properties that default to `None`, could you clarify how that is different than what I'm doing in `__init__`? Since my constructor is an abstract method and all my properties are being set to `None` within this method? – Yu Chen Aug 06 '17 at 20:57
  • 1
    `None` holds its own location in memory (`id(None)`). Anything that has a value equal to `None` will point to this memory location. Both methods are equivalent in their functionality, although one is much simpler. – Alexander Aug 06 '17 at 21:11
  • Got it. So ```self.df_train is self.df_test``` will evaluate to true until I assign it values. – Yu Chen Aug 06 '17 at 22:57