I have some huge JSON files I need to profile so I can transform them into some tables. I found jq
to be really useful in inspecting them, but there are going to be hundreds of these, and I'm pretty new to jq
.
I already have some really handy functions in my ~/.jq
(big thank you to @mikehwang)
def profile_object:
to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
| sort_by(.key) | from_entries;
def profile_array_objects:
map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;
I'm sure I'll have to modify them after I describe my question.
I'd like a jq
line to profile a single object. If a key maps to an array of objects then collect the unique keys across the objects and keep profiling down if there are nested arrays of objects there. If a value is an object, profile that object.
Sorry for the long example, but imagine several GBs of this:
{
"name": "XYZ Company",
"type": "Contractors",
"reporting": [
{
"group_id": "660",
"groups": [
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Austin, TX",
"value": "873275"
}
},
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Nashville, TN",
"value": "2393287"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "FFVII",
"code": "84144",
"type": "DJ",
"type_version": "V10",
"description": "DJ in a mask",
"name": "Claptone",
"negotiated_rates": [
{
"company_references": [
1,
5,
458
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 17.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_modifier_code": [
"124"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
747
],
"negotiated_prices": [
{
"type": "fee",
"rate": 28.42,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
},
{
"negotiation_arrangement": "MGS3",
"name": "David Byrne",
"type": "Producer",
"type_version": "V10",
"code": "654321",
"description": "Frontman from Talking Heads",
"negotiated_rates": [
{
"company_references": [
1,
9,
2344,
8456
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 68.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
679
],
"negotiated_prices": [
{
"type": "fee",
"rate": 89.25,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
}
],
"version": "1.3.1",
"last_updated_on": "2023-02-01"
}
Desired output:
{
"name": "string",
"type": "string",
"reporting": [
{
"group_id": "number",
"groups": [
{
"ids": [
"number"
],
"market": {
"type": "string",
"value": "string"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "string",
"code": "string",
"type": "string",
"type_version": "string",
"description": "string",
"name": "string",
"negotiated_rates": [
{
"company_references": [
"number"
],
"negotiated_prices": [
{
"type": "string",
"rate": "number",
"expiration_date": "string",
"code": [
"string"
],
"billing_modifier_code": [
"string"
],
"billing_class": "string"
}
]
}
]
}
],
"version": "string",
"last_updated_on": "string"
}
Really sorry if there's any errors in that, but I tried to make it all consistent and about as simple as I could.
To restate the need, recursively profile each key in a JSON object if a value is an object or array. Solution needs to be key name independent. Happily to clarify further if needed.