How to use JSON file formats in the context of Azure Data Lake Analytics respectively usql
Set Up process
- Download the repo from [1]
- Open the solution in
.\Examples\DataFormats\Microsoft.Analytics.Samples.sln
- Build the solution
- Get
.\Examples\DataFormats\Microsoft.Analytics.Samples.Formats\bin\Debug\Microsoft.Analytics.Samples.Formats.dll
- Get
.\Examples\DataFormats\Microsoft.Analytics.Samples.Formats\bin\Debug\Newtonsoft.Json.dll
- In ADLS create a folder where to store the assemblies (e.g.
.\assemblies
)
- Create a sample file in the folder if it disappears (happened to me)
- With Visual Studio:
- Add both files to the folder of the ADLA solution (e.g.
.\lib\...
)
- Open
Cloud Explorer
, navigate to the ADLA Database -> Assemblies -> right click and register the assemblies
- Store the assemblies in the folder created before in ADLS
- Reference the assemblies as can be seen below
Usage
JSON Input Data
- Use the Extractor as e.g. shown below
- For more information see [2] and [3]
Resources
[1] GitHub Azure USQL
[2] GitHub Azure USQL DataFormats
[3] U-SQL - Extract data from json-array
U-SQL Script
DECLARE @localDevelopment bool = true;
IF @localDevelopment == true THEN
DROP ASSEMBLY IF EXISTS [Newtonsoft.Json];
DROP ASSEMBLY IF EXISTS [Microsoft.Analytics.Samples.Formats];
CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/lib/Newtonsoft.Json.dll";
CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"/lib/Microsoft.Analytics.Samples.Formats.dll";
DECLARE @input string = @"/data/input.json";
DECLARE @output string = @"/data/output.csv";
ELSE
DECLARE @input string = @"/data/input.json";
DECLARE @output string = @"/data/output.csv";
END;
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
//Define schema of file, must map all columns
//Names must match keys
@extractDataFirstLevel =
EXTRACT sessionId int,
deviceId string,
IoTHub string
//Date DateTime
FROM @input
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
@selectData =
SELECT sessionId,
deviceId,
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(IoTHub)["MessageId"] AS messageId
FROM @extractDataFirstLevel;
OUTPUT @selectData
TO @output
USING Outputters.Csv();