1

I have multiple log messages each containing a list of JobIds -

IE -

1. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890039","db7a18ae-ea59-4987-87d5-c80adefa4475"]}`
2. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890040","db7a18ae-ea59-4987-87d5-c80adefa4489"]}`
3. `{"JobIds":["661ce07c-b5f3-4b37-8b4c-a0b76d890070"]}`

I have a rex to get those jobIds. Next I want to count the number of jobIds

My query looks like this -

| rex field=message "\"(?<job_ids>(?:\w+-\w+-\w+-\w+-\w+)+),?\"" 
| stats count(job_ids)

But this will only give me a count of 3 when I am looking for 5. How can I get a count of all jobIds? I am not sure if this is a splunk limitation or I am missing something in my regex.

Here is my regex - https://regex101.com/r/vqlq5j/1

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Adjit
  • 10,134
  • 12
  • 53
  • 98
  • No, repeated capturing groups always keep the last matched substring in their buffer. Match the whole and split. Or, use several optional non-capturing groups with capturing group inside them if you know there can be a finite, certain amount of these values in the input string. – Wiktor Stribiżew Jan 20 '23 at 20:48
  • 1
    @WiktorStribiżew - regex part alone does not solve the ***Splunk*** problem. OP still needs to understand how to produce multi-value fields and how to count them together. – PM 77-1 Jan 20 '23 at 21:53
  • @WiktorStribiżew - you're not wrong *in general*, but that doesn't apply (in the same way) to *Splunk* – warren Jan 23 '23 at 13:17

2 Answers2

0

In Splunk, to capture multiple matches from a single event, you need to add max_match=0 to your rex, per docs.Splunk

But to get them then separated into a singlevalue field from the [potential] multivalue field job_ids that you made, you need to mvxepand or similar

So this should get you closer:

| rex field=message max_match=0 "\"(?<job_id>(?:\w+-\w+-\w+-\w+-\w+)+),?\"" 
| mvexpand job_id
| stats dc(job_id)

I also changed from count to dc, as it seems you're looking for a unique count of job IDs, and not just a count of how many in total you've seen

Note: if this is JSON data (and not JSON-inside-JSON) coming into Splunk, and the sourcetype is configured correctly, you shouldn't have to manually extract the multivalue field, as Splunk will do it automatically

Do you have a full set of sample data (a few entire events) you can share?

warren
  • 32,620
  • 21
  • 85
  • 124
0

Also with max-match=0 but with mvcount() instead of mvexpand():

| makeresults count=3 | streamstats count
| eval message=case(count=1, "{\"JobIds\":[\"a1a2a2-b23-b34-d4d4d4\", \"x1a2a2-y23-y34-z4z4z4\"]}", count=2, "{\"JobIds\":[\"a1a9a9-b93-b04-d4d4d4\", \"x1a9a9-y93-y34-z4z4z4\"]}", count=3, "{\"JobIds\":[\"a1a9a9-b93-b04-d14d14d14\"]}")
``` above is test data setup ```
``` below is the actual query ```
| rex field=message max_match=0 "\"(?<id>[\w\d]+\-[\w\d]+\-[\w\d]+\-[\w\d]+\")"
| eval cnt=mvcount(id)
| stats sum(cnt)
PM 77-1
  • 12,933
  • 21
  • 68
  • 111
  • This is pretty much what I did in the end, but instead I just used the regex to get the whole list of jobId's and split - `mvcount(split(job_ids, ","))` – Adjit Jan 23 '23 at 18:04