1

I am trying to remove two numbers from the middle of a string in a column in R. For example, I would like to remove the 4th and 5th characters from my reference column. So, the first two numbers following the three letters.

what I have:

Reference
CHM2001011805
TBM2010071208
TBM2015050501
TQM2008080202
IRM2007050106
CKM2014090101
IRM1998050106
CHM1998011805

what I want:

Reference
CHM01011805 
TBM10071208
TBM15050501
TQM08080202
IRM07050106
CKM14090101
IRM98050106
CHM98011805

I found several codes that remove the beginning or end of a string as well as this.. Remove middle part of string in R

but that only works when the numbers/characters before and after are constant.

1 Answers1

1

We can use sub to capture the first 3 characters (inside the parentheses) and remove the next two by specifying the backreference (\\1) of the captured group (\\1)

df1$Reference <- sub("^(...)..", "\\1", df1$Reference)
df1$Reference
#[1] "CHM01011805" "TBM10071208" "TBM15050501" "TQM08080202" "IRM07050106" "CKM14090101" "IRM98050106" "CHM98011805"

data

df1 <- structure(list(Reference = c("CHM2001011805", "TBM2010071208", 
"TBM2015050501", "TQM2008080202", "IRM2007050106", "CKM2014090101", 
"IRM1998050106", "CHM1998011805")), class = "data.frame", row.names = c(NA, 
-8L))
akrun
  • 874,273
  • 37
  • 540
  • 662