11

In Azure Data Explorer, I am trying to use both the 'project' and 'distinct' keywords. The table records have 3 fields I want to use the 'project' on:

  1. CowName
  2. CowType
  3. CowNum
  4. CowLabel

But there are many other fields in the table such as Date, Measurement, etc, that I do not want to return.

Cows
| project CowName, CowType, CowNum, CowLabel

However, I want to avoid duplicate records of CowName and CowNum, so I included

Cows
| project CowName, CowType, CowNum, CowLabel
| distinct CowName, CowNum

But when I do this, the only columns that are returned are CowName and CowNum. I am now missing CowType and CowLabel entirely.

Is there a way to use both 'project' and 'distinct' without them interfering with each other? Is there a different approach I should take?

Adam
  • 155
  • 1
  • 2
  • 9

2 Answers2

12

You can do:

Cows
| distinct CowName, CowType, CowNum

or, if you don't want to have distinct values of CowType - and just have any value of it:

Cows
| summarize any(CowType) by CowName, CowNum

References:
Summarize operator: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/summarizeoperator
Distinct operator:https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/distinctoperator
any() aggregation function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/any-aggfunction

Alexander Sloutsky
  • 2,827
  • 8
  • 13
3

You can use this

| summarize any(CowType, CowLabel) by CowName, CowNum

To visualize how this will work take the following sample table/query:

let CowTable = datatable(CowNum:int, CowName:string, CowType:string, CowLabel:string, DontWantThis:int) 
[
   1, "Bob", "Bull", "label1", 99,
   2, "Tipsy", "Heifer", "label1", 98,
   3, "Milly", "Heifer", "label2", 99,
   4, "Bob", "Bull", "label2", 87,
   4, "Bob", "Bull", "label2", 77,
   2, "Hanna", "Heifer", "label1", 98,
];
CowTable
| summarize any(CowType, CowLabel) by CowName, CowNum

Results:

sample results output from KQL query

Note that we do not see CowNum 4 listed twice, however we do see CowNum 2 listed twice; this is because those rows are unique in regard to the CowName & CowNum. We also see Bob listed twice (not 3 times); this is because 2 of the Bob entries are unique in regard to CowName/CowNum, but 2 of the Bob entries are not unique in regard to CowName/CowNum.

If you truly only want results where the CowName is unique and the CowNum is also distinct you can do this in a 2-step summarize:

CowTable
| summarize any(CowName, CowType, CowLabel) by CowNum
| summarize any(CowNum, any_CowType, any_CowLabel) by any_CowName
//normalize column names
| project CowNum = any_CowNum, CowName = any_CowName, CowType = any_any_CowType, CowLabel = any_any_CowLabel

Results:

Alternate KQL query results

SendETHToThisAddress
  • 2,756
  • 7
  • 29
  • 54