aligning sentences to corpus and finding mismatches

Question

The ideal goal is to correct the output from a speech2text model according to a reference corpus (the actual text). I don't mind using any off the selves tool either in NLP space or ElasticSearch

I have a reference corpus like the following:

It is a reliance that has led to a cycle of addiction that has destroyed lives it is a cycle that makes you sick when you try to stop and potentially takes your life when you don't and beyond its physical effects this cycle of addiction also includes constant contact with the criminal justice system and not just a cycle of arrests release and violation.

In fact its much longer ...

On the other hand, I have a set of sentences that are recognized from a speech-2-text model in a CSV files

1, is a cycle that makes you dick when
2, try two stops and essentially hates your
3, posses activated
4, lives when who don't and beyond

As you can see there because the speech2text model is not perfect there are errors, for example

1) With references to the corpus these subsentences are misspelled (e.g. dick instead of sick in number the sentence number 1 2) there are sentences that do not match to the corpus at all - e.g. number 3 3) putting the sentences together does not cover the whole paragraph.

So basically I wonder what is this task called in the NLP topic, then I can do a better googling, and I appreciate if you name specific functions or examples that I can leverage, e.g. in Space or NLTK or any other tool.

edit : * I already have experience with nlp (coursera certificate) - therefore, looking for a concrete answer and/or example rather a scientific paper. This is not a general error correction task or the next work recommendation based on sequential models.

score 0 · Answer 1 · answered Oct 11 '19 at 13:05

0

The most suited NLP technique for this is probably language models. They predict the likelihood of a word given the previous words (or surrounding words). They can be used for error correction .
You may find following useful:
article
page

answered Oct 11 '19 at 13:05

DBaker

2,079
9
15

I am looking for a more concrete answer and/or example. This is not a general error correction task or the next work recommendation based on sequential models. – Areza Oct 11 '19 at 13:17
when you edit your question after an answer has been posted, you should start the added paragraph with the word "edit:" – DBaker Oct 11 '19 at 13:27
thanks for reminding me, but that shouldn't be the reason to down grade. – Areza Oct 11 '19 at 13:28

score 0 · Answer 2 · answered Oct 17 '19 at 08:30

Why do you think this is "not a general error correction task"? I think it is. You cool look into 'grammar correction' or 'sentence validity'.

Sentence validity is discussed at How to check whether a sentence is correct (simple grammar check in Python)?. The listed tools also provide suggestions, and could therefore be useful for you.

aligning sentences to corpus and finding mismatches

2 Answers2