I'm going to make the assumption that you want to extract a single wildcard that will work for all your strings. If not, I don't know how to get started.
Also, for this first pass, I will correct your English, changing "apple pie is not too so good"
to "apple pie is not very good"
. The reason will be clear below.
To solve this, we can break it down to finding a common prefix to all the strings, and a common suffix to them all, and joining these together with our *
wildcard character. It might look like this:
const reverse = (s) => [...s] .reverse () .join ('')
const commonPrefix = ([first, ...rest]) =>
first .slice (0, [... first] .findIndex ((c, i) => rest .some (s => s .at (i) !== c)))
const commonSuffix = (ss) =>
reverse (commonPrefix (ss .map (reverse)))
const extractWildcard = (str) =>
commonPrefix (str) + '*' + commonSuffix (str)
console .log (extractWildcard ([
'apple pie is not very good',
'apple pie is not so good',
'apple pie is too good'
])) //=> 'apple pie is * good'
Our commonPrefix
function finds the first index at which some string is different from the first one, then slices the string up to that point. commonSuffix
uses a string reverse
helper function and reverses all the strings, finds their common prefix and reverses that value. (There are many other ways of doing this, some of them much more efficient, but they involve calculating the longest string to start, and are all more complex. If there was a performance problem here, I would look to use one of them instead.)
But there is a problem here, and it's shown by using your original "apple pie is not too so good"
:
extractWildcard ([
'apple pie is not too so good',
'apple pie is not so good',
'apple pie is too good'
]) //=> "apple pie is *o good"
The extra o
next to the wildcard is unwanted. To fix that, we might make a more complex version of extractWildcard
that deals with spaces:
const extractWildcard2 = (s, pre = commonPrefix (s), suf = commonSuffix (s)) =>
pre .slice (0, pre .lastIndexOf (' ')) + ' * ' + suf .slice (suf .lastIndexOf (' ') + 1)
extractWildcard2 ([
'apple pie is not too so good',
'apple pie is not so good',
'apple pie is too good'
]) //=> "apple pie is * good"
Update
Here is a more efficient version of commonSuffix
:
const commonSuffix = (ss, length = Math .max (...ss .map (s => s .length))) =>
(ss [0] || '') .slice (length - Array .from ({length}, (_, i) => i + 1) .findIndex (
(i) => console .log ({i}) || ss .some (s => s .at (-i) !== ss [0] .at (-i))
))
I would stick with the simpler one unless this proved to be an actual bottleneck in an application, though.
Update 2: Simplicity
(This is not about the actual problem, but a discussion on code simplicity; please feel free to ignore it.)
A comment said:
It is pretty much unreadable code even for me [...]. simpler, more readable algorithms do exist for common prefix.
I would like to argue this point. Rich Hickey has a famous talk, Simple Made Easy, which distinguishes "simple" from "easy". Simple is a fairly objective measure about how few concepts are twined together to provide a solution. Easy is a subjective measure of how familiar the solution seems to a particular person.
By writing declarative code, by avoiding techniques like mutable state, by using expressions rather than statements, and by avoiding temporal effects of variable assignment, I'd argue that this code is fairly simple. It may not be familiar, and therefore may be less easy for some users.
I wrote the code above as it stands; I didn't start somewhere and whittle it down to a minimum. But we can try that. We can start with the most imperative -- and to some, most familiar -- style of code:
const commonPrefix = (strings) => {
const firstString = strings [0];
const first = firstString .split ('');
const rest = strings .slice (1);
let index = -1;
let indexFound = false;
for (let i = 0; i < first .length; i ++) {
const character = first [i];
for (let j = 0; j < rest .length; j ++) {
const testString = rest [j];
const testCharacter = testString [i]
if (testCharacter !== character) {
index = i;
indexFound = true;
break;
}
}
if (indexFound) {
break;
}
}
const characters = first .slice (0, index);
const result = characters .join ('');
return result;
}
This may be more familiar/easy to some. But let's look at the implementation. We have twelve local variables. We use a pair of nested loops, each including an if
statement. We perform early escapes in both loops in certain circumstances.
If we do a slight bit of cleanup, and recognize that the outer loop is just doing the job that Array.prototype.findIndex
already does for us, then we can write what I would argue is a simpler version:
const commonPrefix = (strings) => {
const firstString = strings [0]
const first = firstString .split ('')
const rest = strings .slice (1)
const index = first .findIndex ((c, i) => {
for (let j = 0; j < rest .length; j ++) {
const testString = rest [j]
if (testString [i] !== c) {
return true
}
}
return false
})
return first .slice (0, index) .join ('')
}
We've also eliminated some local variables: testCharacter
, characters
, and result
, with no detriment in readability.
Once again, we should be able to notice that the remaining loop is performing the same job as Array.prototype.some
. We can clean that up to leave:
const commonPrefix = (strings) => {
const first = strings [0] .split ('')
const rest = strings .slice (1)
const index = first .findIndex ((c, i) => rest .some (s => s .at (i) !== c))
return first .slice (0, index) .join ('')
}
We're now approaching what I think of as a simple function. But we can notice that index
is defined in one line and used only once, in the very next line, so if we choose, we can simply fold it in, and remove the local variable. Similarly, the parameter strings
is only used to define first
and rest
; we can achieve the same with parameter destructuring, and skip these definitions. At this point, we get back to my initial version:
const commonPrefix = ([first, ...rest]) =>
first .slice (0, [... first] .findIndex ((c, i) => rest .some (s => s .at (i) !== c)))
Note that this version doesn't use first .split ('')
, but instead [... first]
. Either will work; and either seems equally simple. This code uses s .at (i)
instead of s [i]
, since an earlier verion had a parallel commonSuffix
where the at
was much easier to use. But we could change that; either choice seems fine.
In all, this code is much simpler than the longest version. It's not the length itself which is important. But a common side-effect of simpler code is shorter code.
Update 3: White-space style
A comment found that my white-space styling detracts from readability. Just for comparison, here's the same code in a more traditional formatting:
const reverse = (s) => [...s].reverse().join('')
const commonPrefix = ([first, ...rest]) =>
first.slice(0, [... first].findIndex((c, i) => rest.some(s => s.at(i) !== c)))
const commonSuffix = (ss) =>
reverse(commonPrefix(ss.map(reverse)))
const extractWildcard = (str) =>
commonPrefix(str) + '*' + commonSuffix(str)
console.log(extractWildcard([
'apple pie is not very good',
'apple pie is not so good',
'apple pie is too good'
])) //=> 'apple pie is * good'
I find that over-dense, but I know many others prefer it.