3

I have a dataframe column where I would like to keep only the last X characters for each row (in my case 7). The string is in the format xxxxxxx_xxxxxxx where the first and last 7 characters differ.

x <- data.frame("Var" = c("1970820_1970821", "1623789_1623777", "4862221_4862011", "4764567_4767067"))

I would like to reproduce:

data.frame("Var" = c("1970821", "1623777", "4862011", "4767067"))
Sean Zheng
  • 83
  • 5

1 Answers1

2

We can use substr if the need is to extract fixed number of characterss

x$Var <- substring(x$Var, nchar(as.character(x$Var)) - 6)
x$Var
#[1] "1970821" "1623777" "4862011" "4767067"

Or with strsplit

x$Var <- sapply(strsplit(as.character(x$Var), "_", fixed = TRUE), `[`, 2)

Or another option with read.table

x$Var <- read.table(text = as.character(x$Var), sep="_", header = FALSE)[,2]

or remove the characters upto _ and have variable number of digits

x$Var <- sub(".*_", "", x$Var)

Or another option is word

library(stringr)
x$Var <- word(x$Var, 2, sep= "_")

Or another option is str_remove

library(dplyr)
x %>%
   mutate(Var = str_remove(Var, ".*_"))
akrun
  • 874,273
  • 37
  • 540
  • 662