I'm not sure if R is the right place to try this or not but here's my situation. I have a character vector full of strings.
id Words
1 'The'
2 'victory'
3 'wasgreat'
... ...
The original data had some encoding problems and some of the strings are concatenizations of several words:
(ie 'My name is' -> 'Mynameis').
I need to leave the correct words alone and get the misspelled concatenizations separated into their correct substrings.
I'm curious if there's any setup in R to handle this type of problem. I think that there are several programs in python that would handle this much better but my python skills are substantially weaker (bordering on non-existent). However, I'd be willing to consider it as an alternative.
Any suggestions?