6

I have a column in a dataframe where the values are letter-number combinations like G1, K8, A132, etc. I want to split the letter from the number but retain the number as a single number. I have been using strsplit but this gives a list of values as seen below where I would liek to have the output of G and 10:

x <- "G10"
strsplit(x, "")[[1]][1]
"G"
strsplit(x, "")[[1]][-1]
"1" "0"

this leads to the predictable downstream problems when I try to use the numbers as numbers. Here is a paste example where I would like to get "somethingelse_10":

z <-strsplit(x, "")[[1]][-1]
paste("somethingelse",z, sep="_")
"somethingelse_1" "somethingelse_0"

Is there an easy way to split numbers from letters?

zach
  • 29,475
  • 16
  • 67
  • 88

3 Answers3

15

You can use gsub to eliminate all non-digit, or all digit characters like so:

> x <- "A3"
> gsub("[^[:digit:]]","",x)
"3"
> gsub("[:digit:]","",x)
"A"

And then you can use as.numeric to convert from string to number, if you desire.

aaronjg
  • 757
  • 5
  • 15
  • You could add a solution for extracting letters (see joran's answer). In fact, one could write a function to extract letters and numbers and have `apply` work on the data.frame column. :) – Roman Luštrik Jan 04 '12 at 09:20
  • 2
    @aaronjg to get letters you need an extra pair of [] – zach Jan 24 '12 at 17:08
10

The stringr package often has convenient functions for this sort of thing:

require(stringr)
str_extract(c("A1","B2","C123"),"[[:upper:]]")
#[1] "A" "B" "C"
str_extract(c("A1","B2","C123"),"[[:digit:]]+")
#[1] "1"   "2"   "123"

That assumes that each element has exactly one "letter" part, and one "number" part, since str_extract is just pulling the first instance of a match.

Dason
  • 60,663
  • 9
  • 131
  • 148
joran
  • 169,992
  • 32
  • 429
  • 468
4

If as your comment suggests you just have a single letter followed by one or more digits you could do something similar to this:

x <- c("G10", "X1231", "y14522")
# Just grab the first letter
letter <- substring(x, 1, 1)
letter
# [1] "G" "X" "y"
# Grab everything except the first character and convert to numeric
number <- as.numeric(substring(x, 2, nchar(x)))
number
#[1]    10  1231 14522
Dason
  • 60,663
  • 9
  • 131
  • 148