How to extract Azure Application Insights Events using Pyspark?

Question

I'm trying to capture Azure Application Insights event in structured format using the below code in Pyspark (Azure Databricks) -

import requests
import json

appId = "..."
appKey = "..."

query = """traces | where timestamp > ago(1d) | order by timestamp"""
params = {"query": query}
headers = {'X-Api-Key': appKey}
url = f'https://api.applicationinsights.io/v1/apps/{appId}/query'
response = requests.get(url, headers=headers, params=params)
logs = json.loads(response.text)

json = json.dumps(logs)

jsonRDD = sc.parallelize([json])
df = spark.read.option('multiline', "true").json(jsonRDD)

display(df)

However, for some reason, this is returning json structure only. How to convert this into the structured or tabular format ?

Please help!

I tried doing it in a different way and it worked. Thanks for your help though. — Dipanjan Mallick, Oct 01 '21 at 13:33
You can also post it as an answer to help other community members. — Ecstasy, Oct 04 '21 at 03:47

score 1 · Answer 1 · answered Sep 27 '21 at 10:06

registerTempTable has been deprecated in spark 2.0. Use createOrReplaceTempView instead.

df = spark.read.option('multiline', "true").json(jsonRDD)
df.createOrReplaceTempView("df");
query= spark.sql("SELECT timestamp FROM traces)

PySpark SQL also provides a way to read a JSON file by creating a temporary view directly from the reading file using spark.sql(“load JSON to temporary view”)

spark.sql("CREATE OR REPLACE TEMPORARY VIEW zipcode USING json OPTIONS" + 
      " (path 'resources/zipcodes.json')")
spark.sql("select * from zipcode").show()

You can refer to How to parse nested JSON objects in spark sql? and PySpark Read JSON file into DataFrame

How to extract Azure Application Insights Events using Pyspark?

1 Answers1