9

I'm writing a node app using typescript and the aws-sdk to call Athena. Occasionally, the call to Athena will timeout after a minute of waiting for the Athena response. The thing I don't understand is that if I go to the AWS console and look at the Athena History tab, it says my query took 4 seconds to succeed, yet in my code it runs for over a minute waiting for a response. I'm hoping there's something I'm doing wrong in my code.

myFunct... {
    const athena = getAthena();

    const startQueryExecutionResponse: StartQueryExecutionOutput = await athena.startQueryExecution(params).promise();

    let status = "UNKNOWN";
    do { 
        await sleep(1000);
        status = await getQueryStatus(athena, startQueryExecutionResponse);  
    } while (status === "QUEUED" || status === "RUNNING");

    return await athena.getQueryResults(startQueryExecutionResponse as GetQueryResultsInput).promise();
};

const sleep = (ms: number) => {
    return new Promise(resolve => {
        setTimeout(resolve, ms)
    })
};

const getQueryStatus = async (athena: AWS.Athena, startQueryExecutionResponse: Athena.StartQueryExecutionOutput): Promise<QueryExecutionState> => {
    const result: GetQueryExecutionOutput = await athena.getQueryExecution(startQueryExecutionResponse as GetQueryResultsInput).promise();
    console.log(result);
    return result!.QueryExecution!.Status!.State || "Unknown"
};

The code gets stuck in that while loop, and I can see from the console.log below, that the status is Status: { State: 'RUNNING', SubmissionDateTime: 2019-07-23T18:28:00.639Z }, over and over again.

If the AWS console says this query took 4 seconds to complete, why does the getQueryExecution response say it's still RUNNING for minutes?

bug-a-lot
  • 2,446
  • 1
  • 22
  • 27
Daniel Kaplan
  • 62,768
  • 50
  • 234
  • 356
  • Can you try to add a little delay between your calls to `getQueryStatus`? Without that it could potentially be a weird API limit... – Maurice Jul 23 '19 at 19:42
  • @Maurice I updated my answer. I was accidentally calling `getQueryResults` twice per second instead of once and it timed out from that. The second time I tried (with the code above), it worked, but I'm 99% sure that's because the query result was cached at that point. – Daniel Kaplan Jul 23 '19 at 20:32
  • @Maurice with a 1 second delay, it still times out. – Daniel Kaplan Jul 23 '19 at 21:05
  • Have you had any resolution on this? – bug-a-lot Feb 12 '20 at 08:18
  • Yes it is writing the data to s3 behind the scenes while your query says it's completed in Athena console. – Prabhakar Reddy Feb 15 '20 at 06:49

2 Answers2

1

The history in the Athena console shows execution time. In addition to execution time, there is time in queue, planning, and service processing. I've had queries stuck in queue for much longer than the execution time. I think this is likely in your case. Also, the Athena showed queued queries as "RUNNING" until recently.

More info here: AWS Athena concurrency limits: Number of submitted queries VS number of running queries

Example stats of a query stuck in queue:

"Statistics": {
  "EngineExecutionTimeInMillis": 2689,
  "DataScannedInBytes": 35347476,
  "TotalExecutionTimeInMillis": 106676,
  "QueryQueueTimeInMillis": 103971,
  "QueryPlanningTimeInMillis": 600,
  "ServiceProcessingTimeInMillis": 16
}
crumkev
  • 301
  • 1
  • 5
0

It is possible to get the QueryExecutionStatistics for your getQueryExecution function.

         "DataManifestLocation": "string",
         "DataScannedInBytes": number,
         "EngineExecutionTimeInMillis": number,
         "QueryPlanningTimeInMillis": number,
         "QueryQueueTimeInMillis": number,
         "ServiceProcessingTimeInMillis": number,
         "TotalExecutionTimeInMillis": number

That might give you more insight, if you haven't already checked it. It could be the case that the 4 seconds is a subset of the total execution time. Perhaps your query is spending more time in planning as opposed to executing.

EngineExecutionTimeInMillis The number of milliseconds that the query took to execute.

Type: Long

Required: No

QueryPlanningTimeInMillis The number of milliseconds that Athena took to plan the query processing flow. This includes the time spent retrieving table partitions from the data source. Note that because the query engine performs the query planning, query planning time is a subset of engine processing time.

Type: Long

Required: No

source

jeeves
  • 1,871
  • 9
  • 25