1

I am learning Spark now. When I tried to load a json file, as follows:

people=sqlContext.jsonFile("C:\wdchentxt\CustomerData.json")

I got the following error:

AttributeError: 'SQLContext' object has no attribute 'jsonFile'

I am running this on Windows 7 PC, with spark-2.1.0-bin-hadoop2.7, and Python 2.7.13 (Dec 17, 2016).

Thank you for any suggestions that you may have.

Alfabravo
  • 7,493
  • 6
  • 46
  • 82
user281707
  • 11
  • 1
  • 3
  • I have Spark 2.0.0 with me on macOS. However, can you check if `sqlContext.read.json()` works for you? For me, if I want to custom configuration of my spark, I can also do `sc = SparkContext(conf=conf)` then `sqlContext = SQLContext(sc)` – titipata Jan 13 '17 at 19:14
  • 3
    `.jsonFile` has been deprecated; you should use `.read.json()` instead. – Kirk Broadhurst Jan 13 '17 at 19:17
  • Thank you all for the quick help. It worked when I replace ".jsonFile" by "read.json". That's an easy fix. – user281707 Jan 13 '17 at 21:05

3 Answers3

1

You probably forgot to import the implicits. This is what my solution looks like in Scala:

def loadJson(filename: String, sqlContext: SqlContext): Dataset[Row] = {
  import sqlContext._
  import sqlContext.implicits._
  val df = sqlContext.read.json(filename)
  df
}
Sami Badawi
  • 977
  • 1
  • 10
  • 22
0

As mentioned before, .jsonFile (...) has been deprecated1, use this instead:

people = sqlContext.read.json("C:\wdchentxt\CustomerData.json").rdd

Source:

[1]: https://docs.databricks.com/spark/latest/data-sources/read-json.html

mcarton
  • 27,633
  • 5
  • 85
  • 95
Tshilidzi Mudau
  • 7,373
  • 6
  • 36
  • 49
0

First, the more recent versions of Spark (like the one you are using) involve .read.json(..) instead of the deprecated .readJson(..).

Second, you need to be sure that your SqlContext is setup right, as mentioned here: pyspark : NameError: name 'spark' is not defined. In my case, it's setup like this:

from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
myObjects = sqlContext.read.json('file:///home/cloudera/Downloads/json_files/firehose-1-2018-08-24-17-27-47-7066324b')

Note that they have version-specific quick-start tutorials that can help with getting some of the basic operations right, as mentioned here: name spark is not defined

So, my point is to always check to ensure that with whatever library or language you are using (and this applies in general across all technologies) that you are following the documentation that matches the version you are running because it is very common for breaking changes to create a lot of confusion if there is a version mismatch. In cases where the technology you are trying to use is not well documented in the version you are running, that's when you need to evaluate if you should upgrade to a more recent version or create a support ticket with those who maintain the project so that you can help them to better support their users.

You can find a guide on all of the version-specific changes of Spark here: https://spark.apache.org/docs/latest/sql-programming-guide.html#upgrading-from-spark-sql-16-to-20

You can also find version-specific documentation on Spark and PySpark here (e.g. for version 1.6.1): https://spark.apache.org/docs/1.6.1/sql-programming-guide.html

devinbost
  • 4,658
  • 2
  • 44
  • 57