replace duplicate characters from strings

Question

I am trying to remove duplicate character from strings.

dput(test)
c("APAAAAAAAAAAAPAAPPAPAPAAAAAAAAAAAAAAAAAAAAAAAAPPAPAAAAAAPPAPAAAPAPAAAAP", 
"AAA", "P", "P", "A", "P", "P", "APPPPPA", "A", "P", "AA", "PP", 
"PPA", "P", "P", "A", "P", "APAP", "P", "PA")

I create one function to sort the string

strSort <- function(x)
  sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")

Then i use gsub to remove consecutive characters

gsub("(.)\\1{2,}", "\\1", str_Sort(test))

This give out put as

gsub("(.)\\1{2,}", "\\1", strSort(test))
 [1] "AP"   "A"    "P"    "P"    "A"    "P"    "P"    "AAP"  "A"    "P"    "AA"   "PP"   "APP"  "P"    "P"    "A"    "P"    "AAPP" "P"    "AP"

Output should only have one A and/or one P.

score 2 · Answer 1 · answered Mar 20 '21 at 21:26

In the strsplit output, we need to use unique on the sorted elements

sapply(strsplit(test, ""), function(x) 
       paste(unique(sort(x)), collapse=""))
#[1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "AP" "A"  "P"  "A"  "P"  "AP" "P"  "P"  "A"  "P"  "AP" "P"  "AP"

score 2 · Accepted Answer · answered Mar 21 '21 at 03:26

2

Using regex you can do :

gsub('(?:(.)(?=(.*)\\1))', '', test, perl = TRUE)

#[1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "PA" "A"  "P"  "A"  "P"  "PA"
#[14] "P"  "P"  "A"  "P"  "AP" "P"  "PA"

The regex has been taken from here.

answered Mar 21 '21 at 03:26

Ronak Shah

377,200
20
156
213

score 1 · Answer 3 · answered Mar 20 '21 at 21:53

1

Here is another option using utf8ToInt + intToUtf8

> sapply(test, function(x) intToUtf8(sort(unique(utf8ToInt(x)))), USE.NAMES = FALSE)
 [1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "AP" "A"  "P"  "A"  "P"  "AP" "P"  "P" 
[16] "A"  "P"  "AP" "P"  "AP"

answered Mar 20 '21 at 21:53

ThomasIsCoding

96,636
9
24
81

replace duplicate characters from strings

3 Answers3