I've completely revamped my answer based on your specification and Anil's answer below, which is much more widely applicable than what I originally had.
library(tm)
# Here we pretend that your texts are like this
text <- c("wall,", "expression.", "ef.", ":ok", "A.", "3.14", "91.8.10",
"w.a.ll, 6513.645+1646-5")
# and we create a corpus with them, like the one you show
corp <- Corpus(VectorSource(text))
# you create a function with any of the solutions that we've provided here
# I'm taking AnilGoyal's because it's better than my rushed purrr one.
my_remove_punct <- function(x) {
gsub('(?<!\\d)[[:punct:]](?=\\D)?', '', x, perl = T)
}
# pass the function to tm_map
new_corp <- tm_map(corp, my_remove_punct)
# Applying the function will give you a warning about dropping documents; but it's a bug of the TM package.
# We use this to confirm that the contents are indeed correct. The last line is a print-out of all the individual documents together.
sapply(new_corp, print)
#> [1] "wall"
#> [1] "expression"
#> [1] "ef"
#> [1] "ok"
#> [1] "A"
#> [1] "3.14"
#> [1] "91.8.10"
#> [1] "wall 6513.645+1646-5"
#> [1] "wall" "expression" "ef"
#> [4] "ok" "A" "3.14"
#> [7] "91.8.10" "wall 6513.645+1646-5"
The warning you receive about "dropping documents" is not real as you can see by printing. An explanation is in this other SO question.
In the future, note that you can quickly get better answers by providing raw data with the function dput
to your object. Something like dput(TextDoc)
. If it is too much, you can subset it.