4

I would like to read a Json file as Json without parsing. I do not want to use a data frame , I would only like to read it as a regular file with the format still intact. Any idea ? I tried reading using wholtextfile but that creates a df.

RData
  • 959
  • 1
  • 13
  • 33
  • 1
    Possible duplicate of [Read entire file in Scala?](https://stackoverflow.com/questions/1284423/read-entire-file-in-scala) – Harald Gliebe Oct 15 '18 at 17:25

3 Answers3

3

Since you didn't accept the spark specific answer maybe you could try with a normal scala solution like this (using the spray-json library):

import spray.json._

val source = scala.io.Source.fromFile("yourFile.txt")
val lines = try source.mkString finally source.close()
val yourJson = lines.parseJson
Florian Baierl
  • 2,378
  • 3
  • 25
  • 50
1

The upickle library is the easiest "pure Scala" way to read a JSON file:

val jsonString = os.read(os.pwd/"src"/"test"/"resources"/"phil.json")
val data = ujson.read(jsonString)
data.value // LinkedHashMap("first_name" -> Str("Phil"), "last_name" -> Str("Hellmuth"), "birth_year" -> Num(1964.0))

See this post for more details.

The code snippet above is using os-lib to read the file from disk. If you're running the code in a cluster environment, you'll probably want to use a different library. It depends on where the file is located and your environment.

Avoid the other Scala JSON libraries cause they're hard to use.

Powers
  • 18,150
  • 10
  • 103
  • 108
  • I use this in Intellij. But I have error ***not found: value os***; I reloaded Intellij; but error was still there. Would you please take a look at this question: https://stackoverflow.com/q/75529227/6640504. Thank you. – M_Gh Feb 24 '23 at 07:19
0

I've noticed you specified the apache-spark tag, if you meant something for vanilla scala this answer will not be applicable. Using this code you can get an RDD[String] which is the most text-style type of distributed data structure.

// Where sc is your spark context

> val textFile = sc.textFile("myFile.json")
textFile: org.apache.spark.rdd.RDD[String]
Tresdon
  • 1,211
  • 8
  • 18
  • But is the rdd a Json format ? I would need to read as Json format – RData Oct 15 '18 at 17:50
  • I'm kind of confused what's being asked – this will read it as a plain string (without parsing). Otherwise options like `spark.read.json()` will put it into a dataframe which I thought you were hoping to avoid. Note this is using the SparkSessions API – Tresdon Oct 15 '18 at 20:01
  • No parsing needed , I need the Json file to submit to another process which expects a Json input – RData Oct 16 '18 at 21:19
  • Does it expect the name of a json file like `my_file.json` or a string formatted as JSON `{key: value, key1: value}`. I'm assuming the latter because the first is as simple as specifying a file name. If it is the latter you can try something like this to get the result `import scala.io.Source val fileContents: String = Source.fromFile(filename).getLines.mkString` – Tresdon Oct 16 '18 at 22:23