How to read multiple Json files under sub directories using Scala

Question

I am looking a code snippet to find the best practice to read multiple nested JSON files under sub directories in hadoop using scala .

If we can write into one single file in some other directory in hadoop the above JSON files , that would be even better .

Any help is appreciated.

Thanks PG

: are you using Spark with Scala API or how you are using Scala in Hadoop? — Shankar, Sep 29 '16 at 06:44
You can use `sqlContext.read.json("json file path")` to read json file, it returns an `DataFrame`. But you said nested directories, is the json files are having different schemas? — Shankar, Sep 29 '16 at 14:38
Thanks Shankar . Files will be of similar schemas , and I guess it worked to read the files. Now next step is can I write all the files into one single json file may be in 1-2 steps to be performance efficient. — user3054752, Sep 29 '16 at 20:20
Take a look here. I think the top answer may help: http://stackoverflow.com/questions/28203217/how-to-load-directory-of-json-files-into-apache-spark-in-python — sascha10000, Sep 29 '16 at 23:31

score 0 · Answer 1 · answered Sep 30 '16 at 08:52

You can use sqlContext.read.json("input file path") to read json file, it returns an DataFrame.

Once you got the DataFrame, just use df.write.json("output file path") to write the DF as json file.

Code example: if you use Spark 2.0

val spark = SparkSession
      .builder()
      .appName("Spark SQL JSON example")
      .getOrCreate()

      val df = spark.read.json("input/file/path")

      df.write.json("output/file/path")

How to read multiple Json files under sub directories using Scala

1 Answers1