I want to use a relational database to analyze information from Songkick's JSON API for local events.
The event objects in are complex and deeply nested, so I want to filter and flatten the event objects and convert them to CSV so I can load them with standard tools.
Can I use jq to filter and flatten the events?
A typical response from the API is too large to show here. I will show a simplified version with the same relative structure.
Applying the filter .resultsPage.results.event[]
to the response produces a stream of event objects like this.
{
"start": {
"date": "2014-10-28"
},
"performance": [
{
"artist": {
"displayName": "James Keelaghan",
"identifier": [
{
"mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff"
}
]
}
}
],
"venue": {
"displayName": "Live At The Star"
}
}
{
"start": {
"date": "2014-10-28"
},
"performance": [
{
"artist": {
"displayName": "Katy B",
"identifier": [
{
"mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57"
}
]
}
},
{
"artist": {
"displayName": "Becky Hill",
"identifier": [
{
"mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010"
}
]
}
}
],
"venue": {
"displayName": "O2 ABC"
}
}
Next I want to produce one output object for each object in the performance list. These new objects should have attributes from the containing event object, such as date and venue.
The correct output for the example would look like this.
{
"venue_name": "Live At The Star",
"artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
If I ignore the mbid, this jq filter gives me what I want.
{
start_date: .start.date,
artist_name: .performance[].artist.displayName,
venue_name: .venue.displayName
}
The result looks like this.
{
"venue_name": "Live At The Star",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
I tried this filter to get the mbid as well.
{
start_date: .start.date,
artist_name: .performance[].artist.displayName,
artist_mbid: .performance[].artist.identifier[].mbid,
venue_name: .venue.displayName
}
The result looks like this.
{
"venue_name": "Live At The Star",
"artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
Each object looks right, but there are too many of them! The "Katy B" and "Becky Hill" objects are duplicated.
What is the correct way to do this in jq?