1

I'm experiencing odd analytics query behaviour when I add an index on time-series data similar to this:

{
    "clientId": "account-123",
    "orderCode": "49776599-cd7f-458a-b0f2-2d4f28aab85f",
    "timestamp": 1591869354000 // UNIX time milliseconds
}

With an index containing all three fields

index_clientId_timestamp_orderCode (clientId:string)(timestamp:int64)(orderCode:string)`

Odd behaviour

  1. I stop getting any results when I filter on clientId or orderCode
  2. If I add a filter on timestamp then the couchbase server errors.

Steps to reproduce on Couchbase Enterprise Edition 6.5.0 build 4960

Insert some data into a bucket

INSERT INTO `data` (KEY, VALUE)
 VALUES ("key1", {"clientId": "account-123", "orderCode": "49776599-cd7f-458a-b0f2-2d4f28aab85f", "timestamp": 1591869354000}),
 VALUES ("key2", {"clientId": "account-123", "orderCode": "8e2a8061-c69e-40ee-8ff4-e1fc6f03f76a", "timestamp": 1591869355000}),
 VALUES ("key3", {"clientId": "account-123", "orderCode": "757d33a6-85e7-4053-ad8b-bacc69ad0200", "timestamp": 1591869356000}),
 VALUES ("key4", {"clientId": "account-123", "orderCode": "c6022d17-80bf-497f-8f9d-f865896231dc", "timestamp": 1591869357000})

Create an analytics dataset:

CREATE DATASET orders ON `data` WHERE `clientId` = "account-123";

Run the following analytics query, which should work as expected:

SELECT `orders`.*
FROM `orders` orders
WHERE
  `clientId` = 'account-123'
  AND `orderCode` = '49776599-cd7f-458a-b0f2-2d4f28aab85f'
  AND `timestamp` > 1591860000000

Create an index so that the query will scale better:

CREATE INDEX index_clientId_timestamp_orderCode ON `orders`(`clientId`:STRING, `timestamp`:INT64, `orderCode`:STRING)

Run the query again:

SELECT `orders`.*
FROM `orders` orders
WHERE
  `clientId` = 'account-123'
  AND `orderCode` = '49776599-cd7f-458a-b0f2-2d4f28aab85f'
  AND `timestamp` > 1591860000000

and you should get an error:

[
  {
    "code": 25000,
    "msg": "Internal error",
    "query_from_user": "SELECT `orders`.*\nFROM `orders` orders\nWHERE\n  `clientId` = 'account-123'\n  AND `orderCode` = '49776599-cd7f-458a-b0f2-2d4f28aab85f'\n  AND `timestamp` > 1591860000000"
  }
]

When I look in /opt/couchbase/var/lib/couchbase/logs/analytics_error.log I see the following error

2020-06-11T14:58:34.561+00:00 ERRO CBAS.apache.asterix [Executor-104:ClusterController] Unexpected exception
java.lang.NullPointerException: null
        at org.apache.asterix.optimizer.rules.am.BTreeAccessMethod.createKeyVarsAndExprs(BTreeAccessMethod.java:773) ~[asterix-algebra.jar:6.5.0-4960]
        at org.apache.asterix.optimizer.rules.am.BTreeAccessMethod.createIndexSearchPlan(BTreeAccessMethod.java:585) ~[asterix-algebra.jar:6.5.0-4960]
        at org.apache.asterix.optimizer.rules.am.BTreeAccessMethod.applySelectPlanTransformation(BTreeAccessMethod.java:183) ~[asterix-algebra.jar:6.5.0-4960]
        at org.apache.asterix.optimizer.rules.am.IntroduceSelectAccessMethodRule.checkAndApplyTheSelectTransformation(IntroduceSelectAccessMethodRule.java:431) ~[asterix-algebra.jar:6.5.0-4960]
        at org.apache.asterix.optimizer.rules.am.IntroduceSelectAccessMethodRule.checkAndApplyTheSelectTransformation(IntroduceSelectAccessMethodRule.java:343) ~[asterix-algebra.jar:6.5.0-4960]
        at org.apache.asterix.optimizer.rules.am.IntroduceSelectAccessMethodRule.checkAndApplyTheSelectTransformation(IntroduceSelectAccessMethodRule.java:343) ~[asterix-algebra.jar:6.5.0-4960]
        at org.apache.asterix.optimizer.rules.am.IntroduceSelectAccessMethodRule.checkAndApplyTheSelectTransformation(IntroduceSelectAccessMethodRule.java:343) ~[asterix-algebra.jar:6.5.0-4960]
        at org.apache.asterix.optimizer.rules.am.IntroduceSelectAccessMethodRule.rewritePre(IntroduceSelectAccessMethodRule.java:168) ~[asterix-algebra.jar:6.5.0-4960]
        at org.apache.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewriteOperatorRef(AbstractRuleController.java:86) ~[algebricks-core.jar:6.5.0-4960]
        at org.apache.hyracks.algebricks.compiler.rewriter.rulecontrollers.SequentialFixpointRuleController.rewriteWithRuleCollection(SequentialFixpointRuleController.java:52) ~[algebricks-compiler.jar:6.5.0-4960]
        at org.apache.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.runOptimizationSets(HeuristicOptimizer.java:89) ~[algebricks-core.jar:6.5.0-4960]
        at org.apache.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.runLogicalOptimizationSets(HeuristicOptimizer.java:80) ~[algebricks-core.jar:6.5.0-4960]
        at org.apache.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.optimize(HeuristicOptimizer.java:61) ~[algebricks-core.jar:6.5.0-4960]
        at org.apache.hyracks.algebricks.compiler.api.HeuristicCompilerFactoryBuilder$1$1.optimize(HeuristicCompilerFactoryBuilder.java:94) ~[algebricks-compiler.jar:6.5.0-4960]
        at org.apache.asterix.api.common.APIFramework.compileQuery(APIFramework.java:255) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.asterix.app.translator.QueryTranslator.rewriteCompileQuery(QueryTranslator.java:1982) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.asterix.app.translator.QueryTranslator.lambda$handleQuery$3(QueryTranslator.java:2491) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.asterix.app.translator.QueryTranslator.createAndRunJob(QueryTranslator.java:2623) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.asterix.app.translator.QueryTranslator.deliverResult(QueryTranslator.java:2536) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.asterix.app.translator.QueryTranslator.handleQuery(QueryTranslator.java:2506) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.asterix.app.translator.QueryTranslator.compileAndExecute(QueryTranslator.java:399) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.asterix.app.message.ExecuteStatementRequestMessage.handle(ExecuteStatementRequestMessage.java:153) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.asterix.messaging.CCMessageBroker.receivedMessage(CCMessageBroker.java:64) ~[asterix-app.jar:6.5.0-4960]
        at org.apache.hyracks.control.cc.work.ApplicationMessageWork.lambda$notifyMessageBroker$0(ApplicationMessageWork.java:68) ~[hyracks-control-cc.jar:6.5.0-4960]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]

Removing the timestamp filter from the predicate results in the search returning as expected. However, when I do effectively the same operation on a larger dataset for my employer I get no matches at all. I haven't been able to reproduce that behaviour with the above example though.

Any help appreciated :)

Lewis Watson
  • 543
  • 1
  • 4
  • 15

1 Answers1

1

This seems to be a bug in the optimizer.

Try creating two indexes instead:

CREATE INDEX index_clientId_orderCode ON `orders`(`clientId`:STRING, `orderCode`:STRING)
CREATE INDEX index_timestamp ON `orders`(`timestamp`:BIGINT)

Also, the orderCode value looks unique (or low cardinality). Could you just create an index on that and let the remaining filters be computed at query execution time?

CREATE INDEX index_orderCode ON `orders`(`orderCode`:STRING)

Another workaround is to keep the existing index definition, but add an upper bound filter on timestamp:

... AND `timestamp` > 1591860000000 and `timestamp` < 9999999999999999
  • 1
    Thanks for the suggestions Dmitry, I'll try them out and report back. – Lewis Watson Jun 12 '20 at 15:52
  • I've been running benchmark tests with different indexes and the best combination I've come up with so far is to have one index on clientId AND timestamp, then another on clientId AND orderCode. Any other combination either results in the above bug, or worse overall performance. – Lewis Watson Jun 22 '20 at 15:27