In my HPCC Cluster, what does this error in a workunit "Error receiving actinit data for graph: xx" really means?

Question

Note: I am running code in a cluster with 16 slaves, HPCC version 6.4.40

I am running some ECL code that returns this error:

System error: 0: Graph graph2[14], SLAVE #1 [10.313.316.31:20100]: Error receiving actinit data for graph: 14

What does this error exactly indicates?

Is it maybe running out of memory?

In the thor master log just before the exception I can see there are two lines of log, first one starting by NIC (Network Interface?) and other with SYST (System?) Values doesn't seem to change drastically:

score 1 · Answer 1 · answered Jun 08 '23 at 16:03

From the development team:

Why you are seeing that error:

There are a lot of logical files at the same scope level, causing significant access (lookup) slowdowns, ultimately meaning that if there are 100's or 1000's being looked up for a single read, it is exceeding the timeout.

Scope's with a lot of logical files at the same level like this used to be a pain point for Dali and clients accessing files within them. Basically, it caused each lookup to perform a linear search through the scope for match. NB: that was fixed some years ago (in 7.12.0)

So my guess is that the # of files in scopes being accessed by this query (that haven't been rolled up?) have grown and are now causing the cumulative time to look them up to exceed the [25 minute] timeout.

Recommend you rollup your files and/or upgrade your cluster as soon as possible. The current gold release is now up to Version 9.

Hope this helps,

Bob

In my HPCC Cluster, what does this error in a workunit "Error receiving actinit data for graph: xx" really means?

1 Answers1