How to extract the trailing digits from a string in R?

Question

I have a column of data that looks like this:

**varX**

Q1#_1

Q1#_5

Q1#_10

I would like to edit the data to look like this:

**varX**

1

5

10

Is there a command I could use to simply keep all information after the underscore?

you can use `gsub(".*_(\\d+)", "\\1", df$varX)`It'll work even if your pattern at the beginning of your string changes. — Jilber Urbina, Dec 10 '18 at 16:33

score 1 · Answer 1 · answered Dec 10 '18 at 16:43

1

If you want a tidyverse solution, you can use str_extract from the stringr package:

data %>% 
mutate(varx = str_extract(varx, "[0-9]+$")) %>%
mutate(varx = as.numeric(varx)) # include this last line if you want a number and not character

answered Dec 10 '18 at 16:43

Ben G

4,148
2
22
42

score 0 · Accepted Answer · answered Dec 10 '18 at 16:27

0

In case you always have the Q1#_ string, you can do:

gsub("Q1#_", "", df$varX)

answered Dec 10 '18 at 16:27

Nikolay Nenov

547
1
8
27

score 0 · Answer 3 · answered Dec 10 '18 at 16:28

I think you're looking for sub, substitute a certain part of a string with something else. You can give it a regular expression if you want to go fancy, or just give it a literal:

VarX <- sub('Q1#_', '', VarX, fixed=T)

The fancy way ("remove everything before and including the underscore") would be

VarX <- sub('^.*_', '', VarX)

And you may want to convert it to a numeric or an integer:

VarX <- as.integer(sub('Q1#_', '', VarX, fixed=T)) # or as.numeric

score 0 · Answer 4 · answered Dec 10 '18 at 16:28

0

You could you use regular expressions:

df[["varX"]] <- sub(".+_", "", df[["varX"]])
df
  varX
1    1
2    5
3   10

Or regular expressions-free: with strsplit():

df[["varX"]] <- sapply(df[["varX"]], function(x) strsplit(x, "_")[[c(1,2)]])

answered Dec 10 '18 at 16:28

s_baldur

29,441
4
36
69

How to extract the trailing digits from a string in R?

4 Answers4