-2

I have a vector with acronyms like "U.S."

I want to remove the dots between the characters, but I do not want to remove all dots in the whole document, so just those in acronyms.

I can do this by using gsub:

text <- c("U.S.", "U.N.", "C.I.A")
gsub("U.S.", "US", text)

But how can I tell R to remove all points in all possible acronyms (i.e., also in "U.N." Or "C.I.A.")?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
feder80
  • 1,195
  • 3
  • 13
  • 34

2 Answers2

1

You can word boundary here

gsub('\\b\\.','',vec)

or simpler option is stated in the comments!

Chirayu Chamoli
  • 2,076
  • 1
  • 17
  • 32
1

Your question seems a bit different from the code you provide: you want to replace acronyms in text that presumably contains dots that aren't acronyms/abbreviations.

This code extracts and identifies acronyms by searching for repeated capital-dot combinations (which can be manually checked and filtered mid-workflow to make sure it's not picking up anything odd), then replaces them using the mgsub code from Replace multiple arguments with gsub

text1 <- c("The U.S. and the C.I.A. are acronyms. They should be matched.")
m <- gregexpr("([A-Z]\\.)+", text1)
matches <- regmatches(text1, m)[[1]]
matches_nodot <- sapply(matches, gsub, pattern = "\\.", replacement = "")

mgsub <- function(pattern, replacement, x, ...) {
  if (length(pattern)!=length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result <- x
  for (i in 1:length(pattern)) {
    result <- gsub(pattern[i], replacement[i], result, ...)
  }
  result
}


text2 <- mgsub(matches, matches_nodot, text1)
text2
# [1] "The US and the CIA are acronyms. They should be matched."
Community
  • 1
  • 1
Michael Veale
  • 929
  • 4
  • 11