0

Maybe this question has been asked but I couldn't find a solid answer because of the pattern in my data -- hopefully it will be simple to answer. I have polling data that has columns that look similar to this:

Sample
1000 RV
456 LV
678 A

I want to take off the letters, put them in one cell and the numbers in another so that it will look like this:

Sample    Type
1000      RV
456       LV
678       A

How can I simply do this without going cell by cell?

a.powell
  • 1,572
  • 4
  • 28
  • 39

3 Answers3

2

There are a lot of ways to acheive this.

  1. gsub

    sample <- c("123ABC", "234CBA", "999ETC")
    
    a <- gsub("[[:digit:]]","",sample)
    b <- gsub("[^[:digit:]]", "", my.data)
    
  2. stringr

    library(stringr)
    a  <- as.numeric(str_extract(sample, "[0-9]+"))
    b  <- str_extract(my.data, "[aA-zZ]+")
    
  3. The way that Psidom mentions in a comment (I haven't tested it but I trust him)

Hack-R
  • 22,422
  • 14
  • 75
  • 131
0

This achieves a data.frame with numeric Sample column and character Type column, as your example suggests. As others have mentioned, there are many ways to accomplish this.

sample <- c('1000      RV',
            '456       LV',
            '678       A')

A <- strsplit(sample, '\\s+')                # Split by whitespace. Returns a list
B <- unlist(A)                               # Converts 1:3 list to a 6x1 character vector
C <- matrix(B, ncol = 2, byrow = T)          # Convert 6x1 character vector to 3x2 matrix
D <- as.data.frame(C, stringsAsFactors = F)  # Convert matrix to data.frame so columns can be different types

# All together...
D <- as.data.frame(matrix(unlist(strsplit(sample, '\\s+')), ncol = 2, byrow = T),
                   stringsAsFactors = F)

D[ ,1] <- as.numeric(D[ ,1])         # Convert first column to numeric, second remains character
colnames(D) <- c('Sample', 'Type')   # Add column names

> D
  Sample Type
1   1000   RV
2    456   LV
3    678    A
> str(D)
'data.frame':   3 obs. of  2 variables:
 $ Sample: num  1000 456 678
 $ Type  : chr  "RV" "LV" "A"
Ben Fasoli
  • 526
  • 3
  • 7
0

We can use sub

df1$Type <- sub("\\d+", "", df1$Sample)
df1$Type
#[1] "ABC" "CBA" "ETC"

If we need it as two columns, tstrsplit from data.table can be used

library(data.table)
setDT(df1)[, setNames(tstrsplit(Sample, "\\s+"), c("Sample", "Type"))]
akrun
  • 874,273
  • 37
  • 540
  • 662