I am working with spark pipelines and find myself often in a situation where I have a bunch of SQLTransformers that do different things in a pipeline and cant really understand what they do without looking at the entire statement.
I would like to add maybe some simple documentation or tag component to each transformer type(which will be persisted when the transformer is saved) and can be retrieved later if need be.
So basically something like this.
s = SQLTransformer()
s.tag = "basic target generation"
s.save("tmp")
s2 = SQLTransformer.load("tmp")
print(s2.tag)
or
s = SQLTransformer()
s.setParam(tag="basic target generation")
s.save("tmp")
s2 = SQLTransformer.load("tmp")
print(s2.getParam("tag"))
I can see that I cant do either right now because the param objects are locked down and I cant seem to modify the existing ones other than statement or add new ones. But is there anything I can do to get some functinality like this?
I am using Spark 2.1.1 with python.