6

I have a splunk query something like

index=myIndex* source="source/path/of/logs/*.log" "Elephant"

Thus, this brings up about 2,000 results which are JSON responses from one of my APIs that include the world "Elephant". This is kind of what I want - However, some of these results have duplicate carId fields, and I only want Splunk to show me the unique search results

The Results of Splunk looks something like this:

MyApiRequests {"carId":3454353435,"make":"toyota","year":"2015","model":"camry","value":25000.00}

NOW, I just want to filter on the carId's that are unique. I don't want duplicates. Thus, I would expect the original value of 2,000 results to decrease quite a bit.

Can anyone help me formulate my Splunk Query to achieve this?

ennth
  • 1,698
  • 5
  • 31
  • 63

2 Answers2

7

stats will be your friend here.

Consider the following:

index=myIndex* source="source/path/of/logs/*.log" "Elephant" carId=*
| stats values(*) as * by carId
warren
  • 32,620
  • 21
  • 85
  • 124
  • Interesting. When I try this, I get 0 results back. – ennth May 06 '21 at 20:17
  • 1
    This answer and @Mads Hansen's presume the `carId` field is extracted already. If it isn't the neither query will work. The fields can be extracted automatically by specifying either `INDEXED_EXTRACTION=JSON` or `KV_MODE=json` in props.conf. Otherwise, you can use the `spath` command in a query. Either way, the JSON must be in the correct format. For improper JSON, you can use `rex` to extract fields. – RichG May 07 '21 at 00:03
  • @RichG - ennth indicated the field _seems_ to be "available" already – warren May 10 '21 at 14:33
  • Yes, if you do "fields carId" or the "carId=*" as the post stated, it will automatically extract the field "carId" with those values. You can see it if you go to the left side bar of your splunk, it will be extracted there . For some reason, I can only get this to work with results in my _raw area that are in the key=value format. The only thing I can't figure out now is that stats(values) never returns Unique values for me, despite everyone saying it returns only unique values. – ennth Nov 11 '21 at 09:13
  • @ennth - are you sure you have the spelling on the field name correct? – warren Nov 15 '21 at 12:57
4

You could use dedup

index=myIndex* source="source/path/of/logs/*.log" "Elephant" | dedup carId 
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • Okay I tried piping the results (which there was 2000) into dedup and I get 0 events as the results.... I expected to get a filtered list of the results back. I'm assuming if I had, say 5 duplicates, this would have got returned to me... So Is this how dedup works? – ennth May 06 '21 at 20:11
  • 1
    You *can* use `dedup`. But you generally *shouldn't*. It's a very inefficient operation in Splunk – warren May 06 '21 at 20:12