I'm looking into finding an algorithm where files should be matched to specific "buckets" based on some identifier properties on those buckets.
Example:
const buckets = [
{ id: 1, identifiers: ['jhdi753hhdy', 'foo', 'u-123'] },
{ id: 2, identifiers: ['hasd834kasd', 'bar', 'u-112'] },
{ id: 3, identifiers: ['adf8wersbay', 'buz', 'u-234', 'u-112'] },
]
bestMatch('file-jhdi7-53hhdy', buckets) // => [{ id: 1, ... }]
bestMatch('file-u112', buckets) // => [{ id: 2, ... }, { id: 3, ... }]
bestMatch('isBUZZfile', buckets) // => [{ id: 3, ... }]
The algorithm tries to find one or more matches (sorted), based on the given filename, and the best matches of the identifiers of the buckets.
I've already considered a naive implementation based on string matches, but that will only get me so far.
I'm fairly open to any type of solution. Either based purely on the client side (in memory) or server side using the capabilities of specific search engines.
Some considerations for the algorithm:
- Buckets may have identifiers duplicated
- Tokens may not be an exact match, but exact matches should be considered "better"
- If the algorithm can say witch identifiers it used to match a result, that would be very nice plus
- Having a "confidence" or "weight" for each match would be a very nice plus