2

I am using sparksessions read().json method to read the json file before converting it to parquet file and it is working fine but the .json(JAVARDD) method is showing as deprecated method. Can we have alternative method for this. Using currently java : spark version 2.4.3

I have gone through the documentation of spark library but did not get any alternative method

JavaSparkContext sc1 = JavaSparkContext.fromSparkContext(SparkContext.getOrCreate());
JavaRDD<String> rddData = sc1.parallelize(data);
Dataset<Row> dataDF = spark.read().json(rddData);

Here .json method is showing as deprecated method ? Can we have alternative method for this.

I've gone through How to parse JSON in Spark with fasterxml without SparkSQL?

and here they suggest SqlContext method which is also deprecated.

Need to know alterntive method of .json in java spark.read().json(JAVARDD object)

mazaneicha
  • 8,794
  • 4
  • 33
  • 52
raj03
  • 445
  • 1
  • 6
  • 19
  • 1
    First, you are using raw types. You should be using `JavaRDD` and `Dataset`) (if, indeed, they are rdd/dataset of rows). Don't leave them raw. Second, have you read the suggestion in the [Documentation](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html)? – RealSkeptic Aug 05 '19 at 12:41
  • Not possible as List data = Arrays.asList(object); JavaRDD rddData = sc1.parallelize(data); Hence not able to convert the data of type String. – raj03 Aug 05 '19 at 14:06

1 Answers1

5

Seems like all you have to do is convert your rdd to a Dataset<String> (as @RealSkeptic suggested):

Dataset<Row> dataDF_spark24 = spark.read().json(spark.createDataset(rddData.rdd(), Encoders.STRING()));

Alternatively, if you're not tied by keeping JavaRDD<String> rddData = ..., this can be simplified further:

Dataset<String> dfData = spark.createDataset(data, Encoders.STRING());
Dataset<Row> dataDF_spark24 = spark.read().json(dfData);
mazaneicha
  • 8,794
  • 4
  • 33
  • 52