I have a character variable with many words. For example...
words
1 funnel
2 funnels
3 sprout
4 sprouts
5 sprouts.
6 chicken
7 chicken)
8 chicken(2)
Many of the words are the same, just with an s
on the end or symbol ()
, .
) as a type
I want to find words that are plurals/singulars of each other, so I can remove the s
from the end and remain with only singular values.
I also want to remove all the symbols from the end which are typos. For example,
* remove chicken)
because it is not a balanced parathesis
* but preserve chicken(2)
my current attempt has been
# Find words that end in `s`
grep("s$", df$words, ignore.case = TRUE, value = T)
# Remove the `s` from the end of words
df$words <- gsub("s$", "", df$words, ignore.case = T)
# Remove any typos with symbols at the the end of a word
gsub("[^A-z|0-9]|$", "", df$words)
My final code also includes words such as chicken(2)
, which I do not wish to edit.
This shows me many plural words (words that end in
s
), however I have no idea if there is a singular version (the same word without thes
).How can I find words that end in grammar symbols / punctuations marks typos and remove those? (i.e.
(
,.
,!
). i.e remove unbalanced parentheses such aschicken)
, but notchicken(2)
For example...
words
1 funnel
2 funnel
3 sprout
4 sprout
5 sprout
6 chicken
7 chicken
8 chicken(2)