0

I want to use a relational database to analyze information from Songkick's JSON API for local events.

The event objects in are complex and deeply nested, so I want to filter and flatten the event objects and convert them to CSV so I can load them with standard tools.

Can I use jq to filter and flatten the events?

A typical response from the API is too large to show here. I will show a simplified version with the same relative structure.

Applying the filter .resultsPage.results.event[] to the response produces a stream of event objects like this.

{
  "start": {
    "date": "2014-10-28"
  },
  "performance": [
    {
      "artist": {
        "displayName": "James Keelaghan",
        "identifier": [
          {
            "mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff"
          }
        ]
      }
    }
  ],
  "venue": {
    "displayName": "Live At The Star"
  }
}
{
  "start": {
    "date": "2014-10-28"
  },
  "performance": [
    {
      "artist": {
        "displayName": "Katy B",
        "identifier": [
          {
            "mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57"
          }
        ]
      }
    },
    {
      "artist": {
        "displayName": "Becky Hill",
        "identifier": [
          {
            "mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010"
          }
        ]
      }
    }
  ],
  "venue": {
    "displayName": "O2 ABC"
    }
}

Next I want to produce one output object for each object in the performance list. These new objects should have attributes from the containing event object, such as date and venue.

The correct output for the example would look like this.

{
  "venue_name": "Live At The Star",
  "artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
  "artist_name": "James Keelaghan",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
  "artist_name": "Katy B",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
  "artist_name": "Becky Hill",
  "start_date": "2014-10-28"
}

If I ignore the mbid, this jq filter gives me what I want.

{
  start_date: .start.date,
  artist_name: .performance[].artist.displayName,
  venue_name: .venue.displayName
}

The result looks like this.

{
  "venue_name": "Live At The Star",
  "artist_name": "James Keelaghan",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_name": "Katy B",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_name": "Becky Hill",
  "start_date": "2014-10-28"
}

I tried this filter to get the mbid as well.

{
  start_date: .start.date,
  artist_name: .performance[].artist.displayName,
  artist_mbid: .performance[].artist.identifier[].mbid,
  venue_name: .venue.displayName
}

The result looks like this.

{
  "venue_name": "Live At The Star",
  "artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
  "artist_name": "James Keelaghan",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
  "artist_name": "Katy B",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
  "artist_name": "Katy B",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
  "artist_name": "Becky Hill",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
  "artist_name": "Becky Hill",
  "start_date": "2014-10-28"
}

Each object looks right, but there are too many of them! The "Katy B" and "Becky Hill" objects are duplicated.

What is the correct way to do this in jq?

Iain Samuel McLean Elder
  • 19,791
  • 12
  • 64
  • 80

1 Answers1

1

This filter should work:

.resultsPage.results.event | map(
    {
        venue_name: .venue.displayName,
        start_date: .start.date
    }
    +
    (.performance[].artist | {
        artist_mbid: .identifier[].mbid,
        artist_name: .displayName
    })
)

Though the fields aren't in the same order, but you could always reorder if needed:

[
  {
    "venue_name": "Live At The Star",
    "start_date": "2014-10-28",
    "artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
    "artist_name": "James Keelaghan"
  },
  {
    "venue_name": "O2 ABC",
    "start_date": "2014-10-28",
    "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
    "artist_name": "Katy B"
  },
  {
    "venue_name": "O2 ABC",
    "start_date": "2014-10-28",
    "artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
    "artist_name": "Becky Hill"
  }
]

You're trying to create an object for every corresponding performance so you'll have to flatten it down a bit before you start collecting results.

Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • The thing you have to be careful of when using multiple `.[]` filters. If there are more than one value for each array, it will create a result for each combination of values in the arrays which is probably not what you want. – Jeff Mercado Oct 31 '14 at 02:34
  • Thanks! I didn't actually notice the objects were permutations rather than duplicates. The other important learning point here is the use of `map` to apply a filter iteratively while preserving the list. – Iain Samuel McLean Elder Nov 02 '14 at 20:17