I need to convert hierarchical data (AVRO data, which boils down to JSON) into tabular data (csv). Since AVRO have strict schema, I know essentially what form the JSON will take, but I have to do this for many different schema, so I'm looking for a consistent, declarative way to express the transformations I need to make. For example, if my incoming data looks like this…
{
"customers": [
{
"addresses": [
{
"city": "Los Angeles",
"country": "USA",
"county": null,
"postalCode": "90064",
"stateOrProvince": "California",
"street1": "11832 W. Pico Blvd.",
"street2": "",
"street3": "",
"street4": "",
"tags": [
"BILLING"
]
}
],
"company": "",
"dateCreated": "2009-04-24T11:42:31+00:00",
"dateOfBirth": null,
"doNotCall": null,
"email": {
"emailAddress": "general@magentocommerce.com"
},
"emailOptOut": null,
"fullName": {
"firstName": "Test",
"lastName": "General",
"middleName": "",
"prefix": "",
"suffix": ""
},
"gender": null,
"id": {
"Id": "2",
"namespace": "1000020016"
},
"lastModified": "2009-05-08T23:33:06+00:00",
"primaryPhone": {
"number": "866.4.VARIEN",
"type": "UNKNOWN"
},
"sourceIds": null
}
],
"totalItemsFound": 3
}
…I might need to output one row for each customer, like this:
MERCHANT ID|NUM CUSTOMERS|ID|FIRST NAME|LAST NAME|EMAIL|PHONE|STREET|CITY|STATE|ZIP|COUNTRY|EMAIL PREFERENCE
some.merch|3|1000020016-2|Test|General|general@magentocommerce.com|866.4.VARIEN|11832 W. Pico Blvd.|Los Angeles|California|90064|USA|N
I need to be able to express the following things:
- Get all the values from a given key as an array: All the dates-of-birth
- Repeat one value over every row: totalItemsFound, repeated in every row
- Repeat a static value in every row that comes from static data I already know the merchant channel never changes
- And the tricky one: Arbitrarily manipulate the incoming data to produce the desired output:
- Convert the customer's id into namespace-id
- Invert and change a null/boolean value into y/n, as in emailOptOut to EMAIL PREFERENCE
- (re-)format a date or currency
- etc
I started out with jsonpath, but that only solves #1 above. I've been slowly adding a language around jsonpath to serve 2 and 3, but I really don't have a good answer for 4 (besides eval., and I'd really hate to do that). I looked at JSON/T, but couldn't find a python library for it. I even seriously considered writing a middleware to convert the JSON into XML so that I could use XSLT, but I'm hoping someone here at S/O has a better solution before I get that desperate.