I am trying to process transcripts that appears to use voice to text with C#. One major issue I am running into is repeating words and or phrases. I would love to use a RegEx expression to replace them all. Here are some examples:
I, I, I am really wanting to go, but I I am not, am not able to do it.
I would really like to use regex replace so it will turn out something like this
I am really wanting to go, but I am not able to do it.
It appears I have multiple times words repeat either with or without a comma. If I try a replace looking for specific ones, it will replace 2 of the 3 but leave the last two. So it it's becoming a royal pain to come up with a way to looks for multiple repeats and replace them with a single version of that word, so if I have I, I, I..... it is replaced with I or I I and it replaces with just one I.
Also, if there are phrases like:
you know, you know you know
Would like to be able to replace the three with just one
I've tried ones like this: \b(\w+)\s+\1\b
, but it doesn't work with commas
I have looked and can't really find anything that looks for comma separated ones. I'm fine if it has to be multiple calls, but just trying to figure it out.
Any help would be appreciated!