-4

I am new in pyspark. Can you please help me how to get max age from json using pyspark? I tried df.filter(df['employees.age'] > 22).show() It throws error,

org.apache.spark.sql.AnalysisException: cannot resolve '(employees.age > 22)' due to data type mismatch: differing types in '(employees.age > 22)' (array and int).;; 'Filter (employees#0.age > 22)

{'employees': [{'age': '12', 'firstName': 'John', 'lastName': 'Doe'},
  {'age': '14', 'firstName': 'Anna', 'lastName': 'Smith'},
  {'age': '54', 'firstName': 'Peter1', 'lastName': 'Jones1'},
  {'age': '44', 'firstName': 'Peter2', 'lastName': 'Jones2'},
  {'age': '42', 'firstName': 'Peter3', 'lastName': 'Jones3'},
  {'age': '62', 'firstName': 'Peter4', 'lastName': 'Jones4'},
  {'age': '65', 'firstName': 'Peter5', 'lastName': 'Jones5'},
  {'age': '23', 'firstName': 'Peter6', 'lastName': 'Jones6'},
  {'age': '77', 'firstName': 'Pete7', 'lastName': 'Jones7'},
  {'age': '82', 'firstName': 'Peter8', 'lastName': 'Jones8'},
  {'age': '92', 'firstName': 'Peter9', 'lastName': 'Jones9'},
  {'age': '78', 'firstName': 'Peter10', 'lastName': 'Jones10'}]}

I want to find those employee who has age greater than 22.

pault
  • 41,343
  • 15
  • 107
  • 149
Prashant Patel
  • 179
  • 1
  • 2
  • 13

2 Answers2

0

It look like You have a list (or array) in employees.age. Even if you have only 1 item in that array. Try use that in your code

"filter(employees#0.age > 22)"
mayank agrawal
  • 2,495
  • 2
  • 13
  • 32
Aloha
  • 11
  • 1
0

Spark doesn't handle multiple line data very well.

The Example code you linked to shows how to need to lay out the file in the examples folder.

Single objects on multiple lines. Not one object with an array

{'age': '54', 'firstName': 'Peter1', 'lastName': 'Jones1'}
{'age': '44', 'firstName': 'Peter2', 'lastName': 'Jones2'}
{'age': '42', 'firstName': 'Peter3', 'lastName': 'Jones3'}

Also, JSON uses double quotes for keys and values, so you'll need to fix that

Then

df = spark.read().json("file.json") 

And, to find the max age

df.groupBy().max("age").show() 
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245