I have an Azure SQL indexer with more than 600k rows in the view I need to index. The view takes a couple of minutes to compute initially, which appears to be causing issues with resetting the indexer or changing it's configuration.
private readonly SearchIndexerClient _indexerClient;
private async Task<SearchIndexer> CreateOrUpdateIndexer(string indexName, string indexerName,
ISqlServerIndexerConnectionProvider connectionProvider, IIndexerConfigurationProvider cfgProvider)
{
var indexer = new SearchIndexer(name: indexerName, dataSourceName: indexerName, indexName)
{
Schedule = new IndexingSchedule(cfgProvider.IndexingSchedule ?? TimeSpan.FromMinutes(5)),
Parameters = GetIndexerParameters(connectionProvider, cfgProvider)
};
if (await _indexerClient.IndexerExistsAsync(indexerName))
{
await _indexerClient.ResetIndexerAsync(indexerName); // <<--- This one times out
}
return await _indexerClient.CreateOrUpdateIndexerAsync(indexer);
}
Based on the profiling session, I can conclude, that a data source (in my case, it's a SQL view) is being queried after the indexer configuration is changed or the indexer is reset - but before the full re-index, which is a long-running operation. It is unclear to me why that is required, but I though that it would be possible to set a longer timeout on the client, which I tried:
var clientOpts = new Azure.Search.Documents.SearchClientOptions
{
Retry = { NetworkTimeout = TimeSpan.FromMinutes(15) }
};
return new SearchIndexerClient(
MakeEndpointFromOpts(options),
new AzureKeyCredential(options.AdminKey),
clientOpts // <<--- Configured the network timeout when building the Indexer Client
);
But to no avail, according to azure portal, the reset operation ran for 100ms and multiple times in a row and I get the "Task was cancelled" exception when running the ResetIndexerAsync
.
My current hypothesis is that it is not the http client that times out, but the "reset" operation inside the search engine itself, which awaits the call to the data source. It is not clear to me if this is configurable or I have to work around this limitation or even why the reset operation needs to call the data source.
Is there any configuration I can perform to avoid this behavior? This is only an issue when resetting the index or doing full crawl. Every incremental crawl after that takes less than a minute on average.