3

I am trying to do a groupby virtual column in a Druid native query which looks like this...

{
  "queryType": "groupBy",
  "dataSource": "trace_info",
  "granularity": "none",
  "virtualColumns": [
    {
      "type": "expression",
      "name": "tenant",
      "expression": "replace(array_offset(tags, array_offset_of(tagNames, 'tenant')), 'tenant:', '')"
    },
    {
      "type": "expression",
      "name": "rc",
      "expression": "replace(array_offset(tags, array_offset_of(tagNames, 'row_count')), 'row_count:', '')"
    }
  ],
  "dimensions": [
    "tenant"
  ],
  "aggregations": [
    {
      "type": "longSum",
      "name": "trc",
      "fieldName": "rc"
    }
  ],

...
...
...

  "intervals": [
    "..."
  ]
}

This gives out a single row with longsum of all row_counts as if the groupBy column is null.

Is my usage correct or is this a known issue in Druid. The documentation says virtual columns can be used like normal dimensions but, is not very clear on how or even a working example is missing.

Thanks! Phani

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
PhaKuDi
  • 141
  • 8
  • Even if I use a default dimension spec for accessing the virtual columns it is the same result. – PhaKuDi Aug 06 '20 at 09:49
  • I've successfully used groupBy with virtual columns just like in your example. Are you sure that the expression works correctly? Might be worth testing that expression in a simpler query. – legoscia Aug 06 '20 at 09:52
  • yes the expression is correct I tried a scan query with the same expression and the virtual columns shows fine in the results. I am using Druid 0.18.1 if that helps. – PhaKuDi Aug 06 '20 at 09:53
  • @legoscia in your usage were you summing (aggregating) a VC too? – PhaKuDi Aug 06 '20 at 09:56
  • please read the [tag:apache] tag description before you add it again. And if you do: Describe its relevancy by editing the question, because currently there's none. – Olaf Kock Aug 06 '20 at 09:57
  • No, just grouping on it. – legoscia Aug 06 '20 at 09:57

1 Answers1

0

Latest Edit...

Some more digging to find out that the issue was with missing "outputType" attributes on the the virtual columns. Strange because the aggregator is able to auto-detect time and calculate the long sum properly even though the group by results were wrong.

  "virtualColumns": [
    {
      "type": "expression",
      "name": "tenant",
      "expression": "replace(array_offset(tags, array_offset_of(tagNames, 'tenant')), 'tenant:', '')",
      "outputType": "STRING"
    },
    {
      "type": "expression",
      "name": "rc",
      "expression": "replace(array_offset(tags, array_offset_of(tagNames, 'row_count')), 'row_count:', '')"
      "outputType": "LONG"
    }
  ],

See above (below is likely a non-performant way of working around the problem).

After some trial and error I have a workaround for this using extraction dimensions. Although not sure, I suspect that this is a temporary issue in Druid 0.18.1. Hopefully Grouping on VCs will work as advertised in future builds.

{
  "queryType": "groupBy",
  "dataSource": "trace_info",
  "granularity": "none",
  "virtualColumns": [
    {
      "type": "expression",
      "name": "tenant",
      "expression": "replace(array_offset(tags, array_offset_of(tagNames, 'tenant')), 'tenant:', '')"
    },
    {
      "type": "expression",
      "name": "rc",
      "expression": "replace(array_offset(tags, array_offset_of(tagNames, 'row_count')), 'row_count:', '')"
    }
  ],
  "dimensions": [
    {
      "type": "extraction",
      "dimension": "tenant",
      "outputName": "t",
      "extractionFn": {
        "type" : "substring", "index" : 1
      }
    }
  ],
  "aggregations": [
    {
      "type": "longSum",
      "name": "trc",
      "fieldName": "rc"
    }
  ],

...
...
...

  "intervals": [
    "..."
  ]
}
PhaKuDi
  • 141
  • 8
  • This is not an answer - please edit your question with this information instead of answering. You might want to take the [tour] or read [ask] – Olaf Kock Aug 07 '20 at 09:19
  • ...or is it? Reading this again I'm not clear if it gives further explanation to your question or actually solves the problem... Which is it? – Olaf Kock Aug 07 '20 at 14:10
  • Please use the *Post Your Answer* button only for actual answers. You should [edit](https://stackoverflow.com/q/63280959/3773011) your original question to add additional information. What you've put here appears to be partially additional information for your question and partially a workaround/answer. Please edit both your question and this answer such that all the question material is in the question and everything for the answer is here in the answer. – Makyen Aug 07 '20 at 22:47