1

I'm trying to capture Azure Application Insights event in structured format using the below code in Pyspark (Azure Databricks) -

import requests
import json

appId = "..."
appKey = "..."

query = """traces | where timestamp > ago(1d) | order by timestamp"""
params = {"query": query}
headers = {'X-Api-Key': appKey}
url = f'https://api.applicationinsights.io/v1/apps/{appId}/query'
response = requests.get(url, headers=headers, params=params)
logs = json.loads(response.text)

json = json.dumps(logs)

jsonRDD = sc.parallelize([json])
df = spark.read.option('multiline', "true").json(jsonRDD)

display(df) 

However, for some reason, this is returning json structure only. How to convert this into the structured or tabular format ?

Please help!

Dipanjan Mallick
  • 1,636
  • 2
  • 8
  • 20

1 Answers1

1

registerTempTable has been deprecated in spark 2.0. Use createOrReplaceTempView instead.

df = spark.read.option('multiline', "true").json(jsonRDD)
df.createOrReplaceTempView("df");
query= spark.sql("SELECT timestamp FROM traces)

PySpark SQL also provides a way to read a JSON file by creating a temporary view directly from the reading file using spark.sql(“load JSON to temporary view”)

spark.sql("CREATE OR REPLACE TEMPORARY VIEW zipcode USING json OPTIONS" + 
      " (path 'resources/zipcodes.json')")
spark.sql("select * from zipcode").show()

You can refer to How to parse nested JSON objects in spark sql? and PySpark Read JSON file into DataFrame

Ecstasy
  • 1,866
  • 1
  • 9
  • 17