0

I'm trying to read about 300 single json files into pyspark. I can read a single one, but as soon as I use a wildcard it throughs an error:

IllegalArgumentException: 'Unsupported class file major version 56'

I've tried applying the following code:

import pyspark
from pyspark import SparkContext, SparkConf

conf = SparkConf()
sc = SparkContext(appName='azure_test', conf=conf)
sqlContext = pyspark.SQLContext(sc)

data = sqlContext.read.json('test_1*.json')

I'd expect the output to be a DF of the jsons but instead got the error as mentioned above.

Jim Todd
  • 1,488
  • 1
  • 11
  • 15

2 Answers2

0
from pyspark.sql import SparkSession
from pyspark import SparkConf, SparkContext

sc = SparkContext("local[2]")
spark = SparkSession.builder.master("local[2]").getOrCreate()

text = sc.textFile("file1,file2...")
ddff = spark.read.json(text)

or put all the files in some folder and use the folder location

sqlContext.read.json("/tmp/test")
dassum
  • 4,727
  • 2
  • 25
  • 38
0

I think there are no issues with your code but Spark not yet compatible with Java-12.

Run with java-8 and then try to read the json files

import pyspark
from pyspark import SparkContext, SparkConf

conf = SparkConf()
sc = SparkContext(appName='azure_test', conf=conf)
sqlContext = pyspark.SQLContext(sc)

data = sqlContext.read.json('test_1*.json')

from Spark-2.0:

spark.read.option("multiline",True).json("<file_path_to_test_1*.json>").show()
notNull
  • 30,258
  • 4
  • 35
  • 50
  • @Russel did it worked? If yes accept the answer to close the thread :) https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – notNull Jul 29 '19 at 13:51