1

I am creating a GitHub Action workflow which will call a GitHub CLI API request using GraphQL. This gh api graphql response is --paginate and returns JSON Lines (ndjson).

I created the GraphQL and jq queries, and I am close to the desired output; however, my jq query needs to be modified and I can't figure out what to change.

First, here is the desired output format I want to achieve. Notice the single object that holds all the key-value lineage information.

[
  {
    KEY: VALUE,
    KEY: VALUE,
    ...
  }
]

And here is the actual format of the output that I am getting. Notice that every single key-value information is wrapped within its own object.

[
  {
    KEY: VALUE,
  },
  {
    KEY: VALUE,
  },
  ...
]

Here is my current jq query filter along with a snippet of the GraphQL response in jq play. It contains a snippet of 2 JSON Lines (jsonl, ndjson) entries (pretty printed). Search for data to see each individual response.

I need to --slurp/-s my jq query due to the paginated results.

I want to only include milestones which:

  • have 100% progress
  • don't include the word "withdrawn" within the title
  • have issues associated with them.

Also, if the milestone title contains either or , , then I need to split the title. Each split will be its own key with identical values.

Here is my jq query that needs to be modified:

.[] | .data.repository as {
    nameWithOwner: $name, 
    milestones: { 
        nodes: $milestones
    }
}
| [
    foreach $milestones[] as $milestone (
        null; $milestone ; 
        $milestone
        | select($milestone.progressPercentage == 100)
        | select($milestone.title | contains("withdrawn") | not)
        | select($milestone.issues.nodes[])
        |
        {
            (($milestone.title | gsub(", "; " ") | split(" "))[]) : 
            [
                foreach $milestone.issues.nodes[] as $issue (
                    {}; . + { $issue };
                    $issue as $issue | $issue
                    | (reduce $issue.labels.nodes[] as $item ([]; . + [$item.name])) as $labels
                    |
                    {
                        repo: $name,
                        issue: $issue.number,
                        milestone: $milestone.number,
                        labels: $labels
                    }
                    
                )
            ]
        }
    )
]
| .

Here is a small JSON snippet which needs to be filtered by jq. It has 2 milestones but will output 3 key-value pairs (keys: C.1, EXAMPLE_SPLIT, and B.1.429):

{
  "data": {
    "repository": {
      "nameWithOwner": "cov-lineages/pango-designation",
      "milestones": {
        "pageInfo": {
          "hasNextPage": true,
          "endCursor": "Y3Vyc29yOnYyOpHOAGviZA=="
        },
        "nodes": [
          {
            "number": 1,
            "title": "C.1, EXAMPLE_SPLIT",
            "progressPercentage": 100,
            "issues": {
              "nodes": [
                {
                  "number": 2,
                  "labels": {
                    "nodes": [
                      {
                        "name": "proposed"
                      },
                      {
                        "name": "designated"
                      }
                    ]
                  }
                }
              ]
            }
          },
          {
            "number": 2,
            "title": "B.1.429",
            "progressPercentage": 100,
            "issues": {
              "nodes": [
                {
                  "number": 3,
                  "labels": {
                    "nodes": [
                      {
                        "name": "proposed"
                      },
                      {
                        "name": "designated"
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}
Christopher Rucinski
  • 4,737
  • 2
  • 27
  • 58
  • 1
    Slurping the input and starting out with `.[] | ...` has the same effect as omitting both. You either want to use `map` instead (while slurping), or use `--null-input` in combination with `inputs` which you can use, for example, as generator in your `foreach` loop. Given your sample data, replacing the superfluous `| .` at the end of your filter with `| add` will combine the objects from the array into a single object, but I doubt this alone will also work with a stream of inputs for the reasons given above. – pmf Aug 26 '23 at 10:47
  • @pmf I only have a week of jq knowledge, but thank you for this info. I'm confused by the 1st sentence as I get errors in jq play if I don't `.[] | ` at the start while slurping. Anyways, the `| add` solution works for the minimalized json, but not the real data, so I have to try either the `map` or `--null-input` solution. I'm leaning towards `map` as I have used it before. Should I just replace my outer `foreach` with `map` and slowly test out how to rebuild the filter? – Christopher Rucinski Aug 26 '23 at 11:06
  • 1
    I'm not at home right now, so all I could do was taking a cursory look and leave a comment. I'll have a closer look once I get home (in a few hours) and nobody else has stepped in until then. As for the first sentence, the "slurping" just wraps the input stream of JSONs into an array and provides it to your filter as its initial context. If your first filter is decomposing it into its items again, you're at square one with a stream of JSONs. – pmf Aug 26 '23 at 11:21

2 Answers2

2

Something like this?

.data.repository
| .nameWithOwner as $repo
| .milestones.nodes
| map( # create a new array containing the milestones
select(.progressPercentage == 100) | select(.title | contains("withdrawn") | not) | select(.issues.nodes | length > 0) # filter interesting milestone nodes
  | {
    (.title | splits(",? ")): [ # one object per title part, each object containing an array
        { $repo, milestone: .number } # base milestone data, plus …
        + (.issues.nodes[] | {
          issue: .number,
          labels: [.labels.nodes[].name] # collect all label names in an array
        })
    ]
  })
| add # merge all objects of the array into a single object

Might not be the most efficient solution compared to a reduce-based approach (creates intermediate arrays), but can be easily followed and divided into "logical" parts.

Run with plain jq (no slurping, there's only a single top-level element)

Output with the example data from the question:

{
  "C.1": [
    {
      "repo": "cov-lineages/pango-designation",
      "milestone": 1,
      "issue": 2,
      "labels": [
        "proposed",
        "designated"
      ]
    }
  ],
  "EXAMPLE_SPLIT": [
    {
      "repo": "cov-lineages/pango-designation",
      "milestone": 1,
      "issue": 2,
      "labels": [
        "proposed",
        "designated"
      ]
    }
  ],
  "B.1.429": [
    {
      "repo": "cov-lineages/pango-designation",
      "milestone": 2,
      "issue": 3,
      "labels": [
        "proposed",
        "designated"
      ]
    }
  ]
}

If your input contains multiple objects and you want the final output to be a single object, use -s (--slurp) in combination with map(…):

jq -s 'map( # -s reads everything as one big array, `map` transforms the elements of this array
  .data.repository
  | .nameWithOwner as $repo
  | .milestones.nodes
  | map( # create a new array containing the milestones
    select(.progressPercentage == 100) | select(.title | contains("withdrawn") | not) | select(.issues.nodes | length > 0) # filter interesting milestone nodes
    | {
      (.title | splits(",? ")): [ # one object per title part containing an array
          { $repo, milestone: .number } # base milestone data, plus …
          + (.issues.nodes[] | {
            issue: .number,
            labels: [.labels.nodes[].name] # collect all label names in an array
          })
      ]
    }
  )
)
| add # merge all objects of the array into a single object
'
knittl
  • 246,190
  • 53
  • 318
  • 364
  • Great suggestion that I can learn from! Seems like I can go without `--slurp` even with multiple `data` responses (https://jqplay.org/s/XXDKkrf_D05) – Christopher Rucinski Aug 26 '23 at 14:52
  • 1
    @ChristopherRucinski but you will end up with several objects in your output. That might or might not be what you want – knittl Aug 26 '23 at 15:02
  • Ahhh, I see. Just before `AY.25` I can see the multiple arrays concatenated togrther. Is there anything else that can be done to fix that in jq. In the shell, I might have to replace `}{` with `,` (comma). I will be querying 2 different repos, and then I have to merge the 2 results into 1 valid JSON where a single key might have multiple entries from both repos – Christopher Rucinski Aug 26 '23 at 15:17
  • 1
    @ChristopherRucinski to combine multiple objects, use `add` (but that means that all your objects must put into an array first – instead of `.[]`, you might want to use `map(…)` combined with `--slurp`, see https://stackoverflow.com/questions/73843868/difference-between-slurp-null-input-and-inputs-filter) – knittl Aug 26 '23 at 15:56
  • Oh, I just seen your edit. I've been trying to learn what to change based on that other question, and once again it is way more simpler than I thought. – Christopher Rucinski Aug 26 '23 at 17:04
1

100 milestones: https://jqplay.org/s/hmpl7oz5K2z

[ 
  .data.repository 
  | .nameWithOwner as $repo 
  | .milestones.nodes[] 
  | select(
      .progressPercentage == 100 and 
      (.title | contains("withdrawn") | not) and 
      .issues.nodes[]
    )
  | 
  { 
    (.title | splits(",? ")): (
        { $repo, milestone: .number } + 
        (.issues.nodes[] | { issue: .number, labels: [ .labels.nodes[].name ] }))
  }
] 
| add
{
  "C.1": {
    "repo": "cov-lineages/pango-designation",
    "milestone": 1,
    "issue": 2,
    "labels": [
      "proposed",
      "designated"
    ]
  },
  "EXAMPLE_SPLIT": {
    "repo": "cov-lineages/pango-designation",
    "milestone": 1,
    "issue": 2,
    "labels": [
      "proposed",
      "designated"
    ]
  },
  "B.1.429": {
    "repo": "cov-lineages/pango-designation",
    "milestone": 2,
    "issue": 3,
    "labels": [
      "proposed",
      "designated"
    ]
  }
}
jqurious
  • 9,953
  • 1
  • 4
  • 14
  • 1
    I swear, I did not copy your answer!! Creepy that we came up with two identical solutions independently (minus the `select` and `map` differences; but the `splits` filter is a great substitute for `gsub | split`) – knittl Aug 26 '23 at 13:28
  • @knittl A glitch in the matrix :-D – jqurious Aug 26 '23 at 13:40
  • Thank you @jqurious. Amazing suggestions that I can look over and learn from. And I never seen the share button at jq play; that could have simplified my question a lot. Seems like I don't have to use `--slurp` even with multiple `data` responses (https://jqplay.org/s/WFCDHLrlOtT) crazy how much easier ir can be – Christopher Rucinski Aug 26 '23 at 14:54