adding metadata to a new field in pyspark is easy by
df.withColumn("foo", col("foo").alias("", metadata={...}))
BUT i have a need to do this in SqlTransformer , without custom transformer , as part of the ML pipeline. so after i do:
scalerTransformer = StandardScaler(inputCol='features',
outputCol='scaledFeatures')
i want to replace back the scaledFeatures column name to features. something like :
fieldTransformer = SQLTransformer(statement="select scaledFeatures AS features FROM __THIS__")
but with the metadata stored in feautres column the reason i do that is lack of support with custom transformations and certain kinds of transformers in JPMML-sparkml library.