How do I use jq to convert an arbitrary JSON array of objects to CSV, while objects in this array are nested?
StackOverflow has a sea of questions/answers where specific input or output fields are referenced, but I'd like to have a generic solution that
- includes a header row,
- works for any JSON input including nested arrays + objects,
- allows records that have missing values for keys that are present in other records
- does not hard-code any field names,
- allows converting the CSV back into the nested JSON structure if needed, and
- uses key paths as header names (see the following description).
Dot notation
Many JSON-using products (like CouchDB, MongoDB, …) and libraries (like Lodash, …) use variations of syntax that allows access to nested property values / subfields by joining key fragments with a character, often a dot (‘dot notation’).
An example of a key path like this would be "a.b.0.c"
to refer to the deeply nested property in this JSON snippet:
{
"a": {
"b": [
{
"c": 123,
}
]
}
}
Caveat: Using this method is a pragmatic solution for most cases, but means that either dot characters have to be banned in property names, or a more complex (and definitely never used property name) has to be invented for escaping dots in property names / accessing nested fields. MongoDB simply banned usage of "."
in documents until v5.0, some libraries have workarounds for field access (Lodash example).
Despite this, for simplicity, a solution should use the described dot syntax in the CSV output’s header for nested properties. Bonus if there is a solution variant that solves this problem, e.g. with JSONPath.
Example JSON array as input
[
{
"a": {
"b": [
{
"c": 123
}
]
}
},
{
"a": {
"b": [
{
"c": "foo \" bar",
"d": "qux"
}
]
}
},
{
"a": {
"b": [
{
"d": 456
}
]
}
}
]
Example CSV output
The output should have a header that includes all fields (even if the object at the first array does not have defined values for all existing key paths).
To make the output intuitively editable by humans, each row should represent one object in the input array.
The expected output should look like this:
"a.b.0.c","a.b.0.d"
123,
"foo "" bar","qux"
,456
Command line
This is what I need:
cat example.json | jq <MISSING CODE HERE>