I want to split the JSON format column results in a Spark dataframe:
allrules_internal
table in Hive :
----------------------------------------------------------------
|tablename | condition | filter |
|---------------------------------------------------------------|
| documents | {"col_list":"document_id,comments"} | NA |
| person | {"per_list":"person_id, name, age"} | NA |
---------------------------------------------------------------
Code:
val allrulesDF = spark.read.table("default" + "." + "allrules_internal")
allrulesDF.show()
val df1 = allrulesDF.select(allrulesDF.col("tablename"), allrulesDF.col("condition"), allrulesDF.col("filter"), allrulesDF.col("dbname")).collect()
Here I want to split the condition
column values. From the example above, I want to keep the "document_id, comments" part. In other words, the condition column have a key/value pair but I only want the value part.
If more than one row in allrules_internal
table how to split the value.
df1.foreach(row => {
// condition = row.getAs("condition").toString() // here how to retrive ?
println(condition)
val tableConditionDF = spark.sql("SELECT "+ condition + " FROM " + db_name + "." + table_name)
tableConditionDF.show()
})