I am working on converting a JavaRDD (where the string is a JSON string) into a dataframe and display it. I am doing something like below,
public void call(JavaRDD<String> rdd, Time time) throws Exception {
if (rdd.count() > 0) {
JavaRDD<String> filteredRDD = rdd.filter(x -> x.length()>0);
sqlContext = SQLContextSingleton.getInstance(filteredRDD.context());
DataFrame df = sqlContext.read().schema(SchemaBuilder.buildSchema()).json(filteredRDD);
df.show();
}
}
Where as the schema is as follows,
public static StructType buildSchema() {
StructType schema = new StructType(
new StructField[] { DataTypes.createStructField("student_id", DataTypes.StringType, false),
DataTypes.createStructField("school_id", DataTypes.IntegerType, false),
DataTypes.createStructField("teacher", DataTypes.StringType, true),
DataTypes.createStructField("rank", DataTypes.StringType, true),
DataTypes.createStructField("created", DataTypes.TimestampType, true),
DataTypes.createStructField("created_user", DataTypes.StringType, true),
DataTypes.createStructField("notes", DataTypes.StringType, true),
DataTypes.createStructField("additional_data", DataTypes.StringType, true),
DataTypes.createStructField("datetime", DataTypes.TimestampType, true) });
return (schema);
}
The above code is returning me,
|student_id|school_id|teacher|rank|created|created_user|notes|additional_data|datetime|
+----------+------+--------+-----+-----------+-------+------------+--------+-------------+-----+-------------------+---------+---------------+--------+----+-------+-----------+
| null| null| null| null| null| null| null| null| null|
But, When I don't specify the schema and create the Dataframe as,
DataFrame df = sqlContext.read().json(filteredRDD);
This returned me the result as below,
|student_id|school_id|teacher|rank|created|created_user|notes|additional_data|datetime|
+----------+------+--------+-----+-----------+-------+------------+--------+-------------+-----+-------------------+---------+---------------+--------+----+-------+-----------+
| 1| 123| xxx| 3| 2017-06-02 23:49:10.410| yyyy| NULL| good academics| 2017-06-02 23:49:10.410|
Sample JSON Record:
{"student_id": "1","school_id": "123","teacher": "xxx","rank": "3","created": "2017-06-02 23:49:10.410","created_user":"yyyy","notes": "NULL","additional_date":"good academics","datetime": "2017-06-02 23:49:10.410"}
Any help on what I am doing wrong?