Is there a better way to check for similarities within an array?

Question

I am getting a response which returns and an array of hashes. The array of hashes has two keys "title", and "paragraph". Sometimes I get responses that return similar values within the paragraph key.

For example when I just return the values in the paragraph:

["Welcome to the best place", "Welcome to the best place in the world, Boston!"]

You see that at index 0 it includes what is at index 1

I am mapping through the array of hashes to return one of the keys, "paragraph". I then try to filter out the first element if the value is equal to any of the other elements in the array. I have something that only works when the array has similar values as state above and will return an empty array if it fails.

const description = hotel
    .description()
    .map(descriptions => descriptions.paragraph)
    .filter((paragraph, index) => !paragraph[index].includes(paragraph[0]))

Where hotel.description() returns the array of hashes and the map chain to filter will return the results in an array

The example code above returns a valid response where array:

["Welcome to the best place", "Welcome to the best place in the world, Boston!"]

Becomes:

["Welcome to the best place in the world, Boston!"]

But if the array return is unique an empty array is returned.

The expected results are:

["You are here at the best place", "Welcome to the best place in the world, Boston!"]

The actual results are: []

Not sure what else to append to this chain to get it to return the unique values.

do you want to get all those words that are common in title and paragraph ? — Murtaza Hussain, May 09 '19 at 15:44
@MurtazaHussain I want to return the array for paragraph if all values are unique, and filter out the value that is similar with less length. — Rembrandt Reyes, May 09 '19 at 15:46
I'd have to think a bit more about how to achieve what you want, but the reason I think your code is failing is that `paragraph[index]` and `paragraph[0]` are characters in your current paragraph, not entries in a list of paragraphs. — Scott Sauyet, May 09 '19 at 15:49
so for [ABC, AB, DEF, DEFG] you're expecting [ABC, DEFG]? Also using paragraph[0] make little sense indeed. — user1514042, May 09 '19 at 15:50

pwilcox · Accepted Answer · 2019-05-09T16:13:26.337

2

I'm simplifying your example to work with it, but the concept still applies here. I'm also making the following assumptions:

"Similar" means "includes"
You would be interested in all similarities, not just similarity with the first
Your original data has no strict duplicate phrases (this can be worked around though)
You prefer to remove the subset phrases and keep the superset phrases (if this makes sense).

If so, then the following approach seems to work for your needs:

let greetings = [
  "Welcome to the best place", 
  "Welcome to the best place in the world, Boston!"
];

let condensed = 
  greetings
  .filter(g => 
    !greetings.some(other => other.includes(g) && !(other == g))
  );

console.log(condensed);

And here it is not returning an empty array when all values are non-similar:

let greetings = [
  "You're at the best place", 
  "Welcome to the best place in the world, Boston!"
];

let condensed = 
  greetings
  .filter(g => 
    !greetings.some(other => other.includes(g) && !(other == g))
  );

console.log(condensed);

edited May 09 '19 at 16:13

answered May 09 '19 at 16:01

pwilcox

5,542
1
19
31

Tested this against some use cases and it works wonderfully. Thank you. – Rembrandt Reyes May 09 '19 at 16:12
Great! I made a simple edit to the answer after you commented. Changed "startswith" to "includes". Didn't change the output in this case, and I think it better meets your needs given it's what you did in your question. Don't forget to accept this as the answer if you feel it deserves it. – pwilcox May 09 '19 at 16:16
Regarding the strict duplicates, that "!(other == g)" expression prevents the array from becoming empty. Yet it also makes it so all instances of strict duplicates will output in the result. So you'll have to clean your array beforehand or clean the results afterward to get rid of the duplicates. See [here](https://stackoverflow.com/questions/1960473/get-all-unique-values-in-a-javascript-array-remove-duplicates) for how to do that. – pwilcox May 09 '19 at 16:26

Scott Sauyet · Answer 2 · 2019-05-09T16:14:39.357

This is one possibility. I separate out the detection of similarity and the choosing the better of two similar items from the logic of keeping the similar ones. The function includes simply reports whether one of two strings is a substring of the other, and longer chooses the longer of two strings.

Obviously those helper functions can be embedded back into the main function, but I think this is more logical.

const keepSimilar = (similarTo, better) => (xs) => 
  xs.reduce((found, x) => {
    const index = found.findIndex(similarTo(x))
    if (index > -1) {
      found[index] = better(x, found[index])
    } else {
      found.push(x)
    }
    return found
  }, [], xs)

const includes = (s1) => (s2) => s1.includes(s2) || s2.includes(s1)
const longer = (s1, s2) => s2.length > s1.length ? s2 : s1 

const similarParas = keepSimilar(includes, longer)

const paras = ['foo', 'bar', 'baz', 'foobar', 'bazqux']

console.log(similarParas(paras)) //=> ['foobar', 'baz', 'barqux']
console.log(similarParas(['ABC', 'AB', 'DEF', 'DEFG'])) //=> ['ABC','DEFG']
console.log(similarParas([
  'Welcome to the best place', 
  'Welcome to the best place in the world, Boston!'
]))
//=> ['Welcome to the best place in the world, Boston!']

console.log(similarParas([
  'You are here at the best place', 
  'Welcome to the best place in the world, Boston!'
]))
//=> ['You are here at the best place', 'Welcome to the best place in the world, Boston!']

This is not very pretty code. I'm one of the principles of Ramda, and I would do it very differently with a library like that, especially avoiding mutation of the accumulator object. But this should work.

I like the idea of "better" - CE - continuous enchantment – user1514042 May 10 '19 at 11:10 — user1514042, May 10 '19 at 11:10

user1514042 · Answer 3 · 2019-05-10T09:05:07.470

Here's how you do it in 'one go', using reduce array comprehension:

const result =
        [{ paragraph: "D" }, { paragraph: "A" }, { paragraph: "ABC" }, { paragraph: "AB" }, { paragraph: "A" }, { paragraph: "DEFG" }, { paragraph: "DE" }]
            .map(({ paragraph }) => paragraph)
            .sort()
            .reverse()
            .reduce((existingParagraphs, currentParagraph) => {

                if (existingParagraphs.length == 0
                    || !existingParagraphs.some(existingParagraph => existingParagraph.startsWith(currentParagraph))) {
                    existingParagraphs.push(currentParagraph);
                }
                return existingParagraphs;
            }, []);

Is there a better way to check for similarities within an array?

3 Answers3