1

I have a column of data that looks like this:

**varX**

Q1#_1

Q1#_5

Q1#_10

I would like to edit the data to look like this:

**varX**

1

5

10

Is there a command I could use to simply keep all information after the underscore?

Ben G
  • 4,148
  • 2
  • 22
  • 42
mdavis
  • 15
  • 1
  • 4
  • you can use `gsub(".*_(\\d+)", "\\1", df$varX)`It'll work even if your pattern at the beginning of your string changes. – Jilber Urbina Dec 10 '18 at 16:33

4 Answers4

1

If you want a tidyverse solution, you can use str_extract from the stringr package:

data %>% 
mutate(varx = str_extract(varx, "[0-9]+$")) %>%
mutate(varx = as.numeric(varx)) # include this last line if you want a number and not character
Ben G
  • 4,148
  • 2
  • 22
  • 42
0

In case you always have the Q1#_ string, you can do:

gsub("Q1#_", "", df$varX)
Nikolay Nenov
  • 547
  • 1
  • 8
  • 27
0

I think you're looking for sub, substitute a certain part of a string with something else. You can give it a regular expression if you want to go fancy, or just give it a literal:

VarX <- sub('Q1#_', '', VarX, fixed=T)

The fancy way ("remove everything before and including the underscore") would be

VarX <- sub('^.*_', '', VarX)

And you may want to convert it to a numeric or an integer:

VarX <- as.integer(sub('Q1#_', '', VarX, fixed=T)) # or as.numeric
Emil Bode
  • 1,784
  • 8
  • 16
0

You could you use regular expressions:

df[["varX"]] <- sub(".+_", "", df[["varX"]])
df
  varX
1    1
2    5
3   10

Or regular expressions-free: with strsplit():

df[["varX"]] <- sapply(df[["varX"]], function(x) strsplit(x, "_")[[c(1,2)]])
s_baldur
  • 29,441
  • 4
  • 36
  • 69