0

Suppose I have a a vector that looks like this, where \n indicates a new line:

m
# [1] AA\nBB\nCC\nDD
# [2] AA\nBB\nEE\nDD
# [3] AA\nBB\nEE\nDD
# [4] AA\nBB\nCC\nDD
# [5] AA\nBB\nFF\nDD

I want to remove all duplicates so that you are left with

m
# [1] AA\nBB\nFF\nDD

Any suggestions? Thanks very much

The real data that I am trying to manipulate is very messy:

head(m)
[1] "FT   motif           619..622\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           complement(619..622)\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           8662..8667\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           complement(8662..8667)\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           205..210\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           complement(205..210)\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           419..423\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           complement(419..423)\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           16843..16858\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77\nFT   motif           complement(16843..16858)\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77"                    
[2] "FT   motif           726..729\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           complement(726..729)\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           13022..13027\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           complement(13022..13027)\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           214..219\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           complement(214..219)\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           474..478\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           complement(474..478)\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           33075..33090\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77\nFT   motif           complement(33075..33090)\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77"                
[3] "FT   motif           781..784\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           complement(781..784)\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           13132..13137\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           complement(13132..13137)\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           470..475\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           complement(470..475)\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           507..511\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           complement(507..511)\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           36423..36438\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77\nFT   motif           complement(36423..36438)\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77"                
[4] "FT   motif           781..784\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           complement(781..784)\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           13132..13137\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           complement(13132..13137)\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           470..475\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           complement(470..475)\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           507..511\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           complement(507..511)\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           36423..36438\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77\nFT   motif           complement(36423..36438)\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77"    
[5] "FT   motif           1167..1170\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           complement(1167..1170)\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           16052..16057\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           complement(16052..16057)\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           14262..14267\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           complement(14262..14267)\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           1207..1211\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           complement(1207..1211)\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           44826..44841\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77\nFT   motif           complement(44826..44841)\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77"
[6] "FT   motif           1167..1170\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           complement(1167..1170)\nFT                   /note=GATC\nFT                   /color=48 249 173\nFT   motif           16052..16057\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           complement(16052..16057)\nFT                   /note=CTGCAG\nFT                   /color=90 236 150\nFT   motif           14262..14267\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           complement(14262..14267)\nFT                   /note=ACCACC\nFT                   /color=197 13 106\nFT   motif           1207..1211\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           complement(1207..1211)\nFT                   /note=CC(A|T)GG\nFT                   /color=252 213 234\nFT   motif           44826..44841\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77\nFT   motif           complement(44826..44841)\nFT                   /note=CCAC.{8}TGA(C|T)\nFT                   /color=132 205 77"

For example I am trying to get rid of lines 4 and 6, because they are exact duplicates of 3 and 5.

alki
  • 3,334
  • 5
  • 22
  • 45

1 Answers1

3

I think the key function to use is duplicated. Then

m[!m %in% m[duplicated(m)]]

will give you all elements that are not duplicated.

kasterma
  • 4,259
  • 1
  • 20
  • 27