I have a U-SQL application that runs in the Azure Data Lake environment. It's supposed to process a file full of JSON data that looks like this, except for being a lot more than two rows in real life.
[
{"reports" : {"direction": "FWD", "drive": "STOPS", "frob_variable": 0}},
{"reports" : {"direction": "FWD", "drive": "CRANKS", "frob_variable": -3}}
]
In that Data Lake job, I have the following line:
@json =
EXTRACT direction string, drive string, frob_variable int FROM @"/input/file.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("reports");
When I dump the contents of that @json
variable to a text file I get empty values: zero-length strings and zero-valued integers. I do get the correct number of output rows though, so it must be iterating over all my input.
A bit of poking around the source code to the JsonExtractor
shows me that my specified JsonPath value ("reports") seems to be returning the "reports" key with the embedded dict. If I try a JsonPath value of "reports.*" I do get the embedded values (e.g., { "FWD", "STOPS", 0 }
) but I really wanted the keys to go along with them so SELECT direction, drive, frob_variable
would return something useful.
Long story short, I'm looking for a way to pull the keys and values from that inner dict. Thus my desired output from the EXTRACT
would be a rowset whose columns are "direction", "drive", and "frob_variable" and whose values are as indicated in the source data. It seems like there should be a JsonPath solution or a simple workaround in U-SQL.