2

I have a column of data in a R data frame that has values such as:

Blue-#105
Green-#8845
Yellow-#5454
Blue-#999

I want to remove the last number part (starting at -#) so that Blue-#999 and Blue-#105 are consider the same thing when plotting. How could I accomplish this?

matsjoyce
  • 5,744
  • 6
  • 31
  • 38
Eric Brotto
  • 53,471
  • 32
  • 129
  • 174
  • 1
    You migth check this question http://stackoverflow.com/questions/3703803/apply-strsplit-rowwise/ and my answer with links to similar problem like yours. – Marek Sep 28 '10 at 06:09

2 Answers2

7

Use regular expressions:

> DF <- data.frame(col=c("Blue-#105", "Green-#8845", "Blue-#999"))
> DF
          col
1   Blue-#105
2 Green-#8845
3   Blue-#999
> DF$col <- gsub("-\\#.*", "", DF$col)
> DF
    col
1  Blue
2 Green
3  Blue
> 

Here we say that all strings starting with -# (where the comment char # needs to be escaped) and followed by whatever --- which is .* in regular expression lingo: any char (the dot) repeated as many times as it fits (the star) --- will get replaced by the empty string, or in other words, removed.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
3

Use the sub or gsub function. For your example you could do something like:

newcolors <- sub("^([^-]*)-.*$", "\\1", oldcolors )

This assumes that the colors are in a vector 'oldcolors' and puts the results into newcolors. The pattern starts at the beginning of the string (^) then matches 0 or more characters that are not dashes ([^-]), the parens around that says to save what is matched. Then it matches a dash followed by further characters (.) until the end of the string ($), the matched portion (the entire string) is then replaced by whatever was matched within the parens (the color).

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • Hey Greg, I like how concise your answer is, but I am getting an error: unexpected ',' in "newdatafr <- gsub("^([^-]*)-.*$")," newdatafr is equivalent to newcolors in your example. – Eric Brotto Sep 27 '10 at 16:02
  • 1
    @Eric : then I think you should copy-paste better. It works fine for me, and the error you provide does not show the same code as Greg posted here. – Joris Meys Sep 27 '10 at 16:15
  • FWIW my `gsub()` call is short / more concise than the `sub()` call shown here. Otherwise, they are of course essentially equivalent. – Dirk Eddelbuettel Sep 27 '10 at 18:16
  • 1
    Yes the 2 regex's are equivalent for the example data given. The difference is that Dirk's focuses on what to throw away and mine focuses on what to keep. Which is better would depend on possible differences in future data. – Greg Snow Sep 27 '10 at 20:26