I have a Hive query that returns data as such:
Date,Name,Score1,Score2,Avg_Score
1/1/2018,A,10,20,15
1/1/2018,B,20,20,20
1/1/2018,C,15,10,12.5
1/1/2018,D,11,12,11.5
1/1/2018,E,21,29,25
1/1/2018,F,10,21,15.5
I use hive_context.sql(my_query).rdd
to get this into an RDD.
My ultimate aim is to get this into a JSON format with descending rank based on Avg_score as follows:
Scores=
[
{
"Date": '1/1/2018',
"Name": 'A',
"Avg_Score": 15,
"Rank":4
},
{
"Date": '1/1/2018',
"Name": 'B',
"Avg_Score": 20,
"Rank":2
}
]
As a first step of getting ranks, I tried implementing this approach but I keep running into errors like AttributeError: 'RDD' object has no attribute 'withColumn'
How would I get this done?