The ideal goal is to correct the output from a speech2text model according to a reference corpus (the actual text). I don't mind using any off the selves tool either in NLP space or ElasticSearch
I have a reference corpus like the following:
It is a reliance that has led to a cycle of addiction that has destroyed lives it is a cycle that makes you sick when you try to stop and potentially takes your life when you don't and beyond its physical effects this cycle of addiction also includes constant contact with the criminal justice system and not just a cycle of arrests release and violation.
In fact its much longer ...
On the other hand, I have a set of sentences that are recognized from a speech-2-text model in a CSV files
1, is a cycle that makes you dick when
2, try two stops and essentially hates your
3, posses activated
4, lives when who don't and beyond
As you can see there because the speech2text model is not perfect there are errors, for example
1) With references to the corpus these subsentences are misspelled (e.g. dick instead of sick in number the sentence number 1 2) there are sentences that do not match to the corpus at all - e.g. number 3 3) putting the sentences together does not cover the whole paragraph.
So basically I wonder what is this task called in the NLP topic, then I can do a better googling, and I appreciate if you name specific functions or examples that I can leverage, e.g. in Space or NLTK or any other tool.
edit : * I already have experience with nlp (coursera certificate) - therefore, looking for a concrete answer and/or example rather a scientific paper. This is not a general error correction task or the next work recommendation based on sequential models.