I am learning Big data using Apache spark and I want to create a custom transformer for Spark ml so that I can execute some aggregate functions or can perform other possible operation on it
Asked
Active
Viewed 163 times
0

ngi
- 51
- 5
-
You can be inspired and adapt the answer for PySpark from https://stackoverflow.com/questions/32331848/create-a-custom-transformer-in-pyspark-ml – bonnal-enzo Apr 26 '22 at 13:27
-
@bonnal-enzo I am new to big data and really not getting this how to do and even not getting and sample or any example in java that how to use it. – ngi Apr 26 '22 at 15:31
1 Answers
1
You need to extends org.apache.spark.ml.Transformer class, this is an abstract class so you have to provide implementation of abstract methods.
As I have seen that in most of the cases we needs to provide implementation of transform(Dataset<?> dataset) method and implementation of String uid() .
Example:
public class CustomTransformer extends Transformer{
private final String uid_;
public CustomTransformer(){
this(Identifiable.randomUID("Custom Transformer"));
}
@Override
public String uid(){
return uid_;
}
@Override
public Transformer copy(ParamMap extra){
return defaultCopy(extra);
}
@Override
public Dataset<Row> transform(Dataset<?> dataset){
// do your work and return Dataset object
}
@Override
public StructType transformSchema(StructType schema){
return schema;
}
}
I am also new in this so I suggest you should learn what are the uses of these abstract methods.

MostlyJava
- 345
- 3
- 21