Goal: I want to find a way to group a character vector like:
x <- c("a800k blue 5", "a800j", "bb-blah5", "a800 7", "bb-blah2", "bb-blah3")
into groups with sort of "lead matches" that give the minimum elements so that they would be called in a grep
search. So the solution to the toy example above would be:
solution <- c("a800", "bb-blah")
because a grep
search of x using the pattern "a800" would yield all 3 elements that start with "a800."
Note: I can make very few assumptions about the character strings that will be contained in the vector. There will be lengths varying between just a few and quite long strings (possibly 10 or more), containing combinations of numbers, letters, spaces, and some special characters that make life very difficult.
So I would love a function that works something like intersect
, maybe, but on each individual string.
Any thoughts?