0

I have a column of data where I am trying to remove all characters which aren't numbers The data looks like:

Col1
Name=12_Represse, Name=12_Represse, Name=12_Represse, Name=13_Heterochrom/l, Name=13_Heterochrom/lo
Name=13_Heterochrom/lo
Name=11_Weak_Tx, Name=11_Weak_Tx, Name=11_Weak_Tx, Name=11_Weak_Tx, Name=11_Weak_Tx, Name=11_Weak_Tx 

The output I expect is:

Col1
12,12,12,13,13
13
11,11,11,11,11,11

I've tried to adapt from other questions which look to remove specific strings. For example trying:

test <- str_replace_all(data$col1,"#[a-z,A-Z]*","")

However this, or similar adaptations where I have tried gsub, does not seem to work. I am new to R so any guidance would help.

zx8754
  • 52,746
  • 12
  • 114
  • 209
DN1
  • 234
  • 1
  • 13
  • 38
  • 1
    Hi DN1. Can you add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). That way you can help others to help you! But the solution is probably going to be something similar to `gsub("[^0-9]", "", data$col1)`, but I'd rather test my suggestion with some sample data (that's why a MRE is so useful). – dario Feb 17 '20 at 16:14

2 Answers2

3

Here are two ways of keeping just the digits.

1. Base R.

gsub("[^[:digit:],]", "", data$col1)
#[1] "12,12,12,13,13"    "13"                "11,11,11,11,11,11"

2. Package stringr.

stringr::str_remove_all(data$col1, "[^[:digit:],]")
#[1] "12,12,12,13,13"    "13"                "11,11,11,11,11,11"

Data.

col1 <- c('Name = "12_Represse", Name="12_Represse", Name="12_Represse", Name="13_Heterochrom/l", Name="13_Heterochrom/lo"',
          'Name="13_Heterochrom/lo"',
          'Name="11_Weak_Tx", Name="11_Weak_Tx", Name="11_Weak_Tx", Name="11_Weak_Tx", Name="11_Weak_Tx", Name="11_Weak_Tx"')
data <- data.frame(col1)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
2

You could try:

gsub("[A-Z]|[a-z]|[=]|\\s|_|/", "", Col1)

So that if Col1 matches yours:

Col1 <- c("Name=12_Represse, Name=12_Represse, Name=12_Represse, Name=13_Heterochrom/l, Name=13_Heterochrom/lo", 
"Name=13_Heterochrom/lo", "Name=11_Weak_Tx, Name=11_Weak_Tx, Name=11_Weak_Tx, Name=11_Weak_Tx, Name=11_Weak_Tx, Name=11_Weak_Tx "
)

You get

gsub("[A-Z]|[a-z]|[=]|\\s|_|/", "", Col1)
#> [1] "12,12,12,13,13"    "13"                "11,11,11,11,11,11"
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87