7

I can't figure out a way to convert a List of Test objects to a Dataset in Spark This is my class:

public class Test {
    public String a;
    public String b;
    public Test(String a, String b){
        this.a = a;
        this.b = b;
    }

    public List getList(){
        List l = new ArrayList();
        l.add(this.a);
        l.add(this.b);

        return l;
    }
}
Edu
  • 73
  • 1
  • 4
  • `code` Test t = new Test("1", "Test"); Test tt = new Test("11", "Test2"); List tl = new ArrayList(); tl.add(t); tl.add(tt); Dataset ds = spark.createDataFrame(tl, Test.class); – Edu Dec 13 '16 at 11:26

1 Answers1

4

Your code in the comments to create a DataFrame is correct. However, there is a problem with the way you define Test. You can create DataFrames using your code only from Java Beans. Your Test class is not a Java Bean. Once you fix that, you can use the following code to create a DataFrame:

Dataset<Row> dataFrame = spark.createDataFrame(listOfTestClasses, Test.class);

and these lines to create a typed Dataset:

Encoder<Test> encoder = Encoders.bean(Test.class);
Dataset<Test> dataset = spark.createDataset(listOfTestClasses, encoder);
Community
  • 1
  • 1
Anton Okolnychyi
  • 936
  • 7
  • 10
  • By default the encoder sorts the variables alphabetically for creating columns in my dataset. How can I change the default order Test class? – Edu Dec 19 '16 at 13:50
  • what is listOfTestClasses here? List listOfTestClasses ? If it's then it is not working for me :( – Murtaza Kanchwala Aug 29 '17 at 11:17