I have text I am cleaning up in R. I want to use stringi, but am happy to use other packages.
Some of the words are broken over two lines. So I get a sub-string "halfword-\nsecondhalfword".
I also have strings that are just "----\nword" and " -\n" (and some others that I do not want to replace.
What I want to do is identify all sub-strings "[a-z]-\n" and then keep the generic letter [a,z], but remove the -\n characters.
I do not want to remove all -\n , and I do not want to remove the letter [a-z].
Thanks!