0

adding metadata to a new field in pyspark is easy by

df.withColumn("foo", col("foo").alias("", metadata={...}))

BUT i have a need to do this in SqlTransformer , without custom transformer , as part of the ML pipeline. so after i do:

scalerTransformer = StandardScaler(inputCol='features',
                                   outputCol='scaledFeatures')

i want to replace back the scaledFeatures column name to features. something like :

fieldTransformer = SQLTransformer(statement="select scaledFeatures AS features FROM __THIS__")

but with the metadata stored in feautres column the reason i do that is lack of support with custom transformations and certain kinds of transformers in JPMML-sparkml library.

user1450410
  • 191
  • 1
  • 13
  • Possible duplicate of [How can I declare a Column as a categorical feature in a DataFrame for use in ml](https://stackoverflow.com/questions/37473380/how-can-i-declare-a-column-as-a-categorical-feature-in-a-dataframe-for-use-in-ml) – 10465355 Mar 13 '19 at 14:36
  • Also [How to change column metadata in pyspark?](https://stackoverflow.com/q/44273080/10465355) – 10465355 Mar 13 '19 at 14:37
  • guys, this is not a duplicate, i know the solution of creating a new column with the metadata , as i wrote in the first code line in my question. but the question is how to do that in SqlTransformer , so i can do that as a pipeline step , without doing a custom transformer with : df =df.withColumn line. – user1450410 Mar 14 '19 at 08:58

0 Answers0