1

This is a question for Azure Cognitive Search team.

Currently we are constantly facing issues with Hit-Highlighting mechanism in Azure Cognitive Search. Maximum size of the highlight is limited to 1000 characters, and can not be increased using API parameters.

The problem is that fairly often we see a highlights without any keywords highlighted in them at all, and the length of this 'highlights' is 1,000, and it is very likely that they were just cropped to fit 1,000 characters limit. Thus, there is no much sense for our users to see highlight, if hits are not actually highlighted.

What is the point of trimming the highlight without any logic behind it? Because sometimes we are even facing situations when the highlight was cropped right in the middle of the match, in other words the highlight ends with text: ' ... some highlighted text [match]keyword[/ma'. As you can see closing tag was cropped, and we see '[/ma' instead of '[/match]'.

How do you expect somebody to use this? ... Is there any workaround?

Eramir
  • 482
  • 1
  • 5
  • 18

1 Answers1

1

I am an engineer on the Azure Cognitive Search team. We are aware of these edge cases with the highlight trimming and apologize for the negative impact on your use-case. This is a recent change intended to serve as a stop-gap measure against service stability issues arising from highlighting extremely large fragments.

We are working on upgrading hit highlighting experience overall and it will be available to the customers from 15th July, 2020. More details can be found here. However the new experience is only enabled for services that are created after that day. For older services, the only workaround at the moment is to pre-process the field text such that the length of each sentence (highlighting boundary) is less than 1000.

Feel free to reach out to the PG at azuresearch_contact@microsoft.com with more details about your scenario and we will try our best to alleviate your issues.

  • Hi Ishan, thank you for your response. How can I preprocess the field text? I am uploading contract documents in PDF/DOCX and other formats, and Cognitive Search is retreiving the results automatically, I am not able to pre-process them. Maybe there is some skill that I should apply? Honestly we are facing this problem quite frequently and for us it is killing the whole highlighting feature. Also, how highlighting mechanism understand the boundary of the sentences? By newline characters? – Eramir May 23 '20 at 10:58
  • Documents are added to the index automatically using indexer, which takes the documents from Azure Blob Storage – Eramir May 23 '20 at 11:05
  • can you please help me with this? – Eramir May 25 '20 at 18:33
  • Responded directly to your email. – Ishan Srivastava May 26 '20 at 23:13