Reading Nested Json with Spark 2.1.1 In JAVA ( Spark 2.2 has solution, but i am working on spark 2.1.1 version)

Question

I would like to create a table in spark-SQL using below-mentioned data.

[{
  "empstr": "Blogspan",
  "empbyte": 48,
  "empshort": 457,
  "empint": 935535,
  "emplong": 36156987676070,
  "empfloat": 6985.98,
  "empdoub": 6392455.0,
  "empdec": 0.447,
  "empbool": 0,
  "empdate": "09/29/2018",
  "emptime": "2018-03-24 12:56:26"
}, {
  "empstr": "Lazzy",
  "empbyte": 9,
  "empshort": 460,
  "empint": 997408,
  "emplong": 37564196351623,
  "empfloat": 7464.75,
  "empdoub": 5805694.86,
  "empdec": 0.303,
  "empbool": 1,
  "empdate": "08/14/2018",
  "emptime": "2018-06-17 18:31:15"
}]

but, when i tried to see the print schema, it is showing corruped_redord. So, could you please help me anyone, how to read nested JSON record in JAVA-spark 2.1.1 Below i will attach my code

case "readjson":

    tempTable = hiveContext.read().json(hiveContext.sparkContext().wholeTextFiles("1.json", 0));
    /*In above line i am getting error at .json says 
    The method json(String...) in the type DataFrameReader is not applicable for the arguments (RDD<Tuple2<String,String>>)


    //tempTable = hiveContext.read().json(componentBean.getHdfsPath());

tempTable.printSchema();
        tempTable.show();
        tempTable.createOrReplaceTempView(componentKey);
        break;

@user10465355 yeah, spark 2.2 has multiline option, as I am working on 2.1.1 spark, so I need to find the solution in spark 2.1.1 version itself. i have tried with all the possibilities, but i could not find the solution for above problem — Sai Mammahi, Mar 05 '19 at 09:03
The solution provided in the link tackles the pb before and after 2.2 — eliasah, Mar 05 '19 at 09:07
@eliasah it might work with scala, but it is not working with java, below I am attaching my screenshot kindly look at it. — Sai Mammahi, Mar 05 '19 at 09:17
when I use sparkContext.wholeTextfiles, it would ask me (path, minimum partition), if I pass both parameters then an error will come at .json, says json supports JSON(string), but passing RDD>. so it suggests cast with Seq. if I cast with seq, spark job will not run. So kindly if you can help me, then please help me out from this. — Sai Mammahi, Mar 05 '19 at 09:27
Can you update your question with the "new" thing you've tried (code and error msg) ? — eliasah, Mar 05 '19 at 09:32
@eliasah i have updated the content kindly take a look at it. thanks — Sai Mammahi, Mar 05 '19 at 12:44

score 2 · Accepted Answer · answered Mar 05 '19 at 13:36

It seems like you are having issues with which parts of the API to use.

You need to remember that SparkContext != JavaSparkContext.

This means that you'll need to create a JavaSparkContext object from your active SparkSession:

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.SparkSession;

// [...]

SparkSession session = SparkSession.builder().getOrCreate();
SQLContext hiveContext = session.sqlContext();
JavaSparkContext sc = JavaSparkContext.fromSparkContext(session.sparkContext());
JavaRDD<String> jsonRDD = sc.wholeTextFiles("path/to/data", 2).values();
Dataset<Row> jsonDataset = hiveContext.read().json(jsonRDD);

jsonDataset.show();

// +-------+-------+----------+------+----------+--------+------+--------------+--------+--------+-------------------+
// |empbool|empbyte|   empdate|empdec|   empdoub|empfloat|empint|       emplong|empshort|  empstr|            emptime|
// +-------+-------+----------+------+----------+--------+------+--------------+--------+--------+-------------------+
// |      0|     48|09/29/2018| 0.447| 6392455.0| 6985.98|935535|36156987676070|     457|Blogspan|2018-03-24 12:56:26|
// |      1|      9|08/14/2018| 0.303|5805694.86| 7464.75|997408|37564196351623|     460|   Lazzy|2018-06-17 18:31:15|
// +-------+-------+----------+------+----------+--------+------+--------------+--------+--------+-------------------+

I hope this will help.

can you help and suggest how to handle this https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename — BdEngineer, May 27 '20 at 06:48

Reading Nested Json with Spark 2.1.1 In JAVA ( Spark 2.2 has solution, but i am working on spark 2.1.1 version)

1 Answers1