I have a bunch of strings of car models:
vec <- c("2010 mercedes-benz sl500r",
"2010 mercedes-benz e550",
"2010 mercedes-benz glk350",
"2010 mercedes-benz c300w",
"2010 mercedes-benz 300")
I want to remove everything that is after the first n letters of the model name so here is the wanted output:
c("2010 mercedes-benz sl",
"2020 mercedes-benz e",
"2017 mercedes-benz glk",
"2013 mercedes-benz c",
"2014 mercedes-benz 300")
The problem is that even if the make is the same, mercedes-benz
, model names do not always have the same structure. Meaning that they can start with 0 to 3 letters, and can finish with a letter or not. I want to remove everything that follows the letters, if there are any.
I've tried:
gsub("(?<=benz\\s\\D)\\w*", "", vec, perl=T)
But it doesn't deal with multiple letters models and this:
gsub("(?<=benz\\s\\D*)\\w*", "", vec, perl=T)
Is not proper in regex for a reason explained here but not quite all understood.
Any idea how to solve this?
I work in R.