-2

I am building an application to identify duplicated and unique data in a JSON file and I want to output the number of Unique records.

I have the a JSON object which has lots of first names and last names. I want to be able to identify duplicate data but also if the names are similar it should identify the data as the same. For example:

 [
   {FirstName: 'Joshua', LastName: 'smith'}
   {FirstName: 'Joshuaa', LastName: 'smith'}
 ]

As you see above the second object has an extra 'a' but I want this to be considered as the same piece of the data as the first object. So basically take into consideration typo's within the data for both FirstName and LastName.

I thought about using Regex but I cant figure where to use it.

Joshua Newman
  • 11
  • 1
  • 2
  • "I thought about using Regex but I cant figure where to use it." - Well in the same code/function where you will parse this data of course! :) – Anurag Srivastava Mar 04 '19 at 09:51
  • 1
    The tough part is how "similar" you want it to be. – holydragon Mar 04 '19 at 09:52
  • Please read https://stackoverflow.com/a/3576273/2506522 – betontalpfa Mar 04 '19 at 09:57
  • Hi Guys, So in my application, the data I read is from an CSV file. My application is a React app, I have an import feature so when the user click the file to import this parses the data to state object. I would like the to data to be similar by one letter in each firstName and lastName – Joshua Newman Mar 04 '19 at 09:59

2 Answers2

0

You can do this, setting a THRESHOLD value you want for similarity, I setted 1 in this example:

const array = [
    { FirstName: 'Joshua', LastName: 'smith' },
    { FirstName: 'Joshuaa', LastName: 'smith' }
];

const THRESHOLD = 1;

const compareCollections = (document) => {
    array.forEach(element => {
        let consideredSimilar = false;

        if (element.FirstName === document.FirstName) {
            // typo should be in the lastname
            if (_checkDifferences(element.LastName, document.LastName) <= THRESHOLD) {
                // they can be similar
                console.log('SIMILAR LASTNAME');
                consideredSimilar = true;
            }
        } else if (element.LastName === document.LastName) {
            // typo should be in firstname
            if (_checkDifferences(element.FirstName, document.FirstName) <= THRESHOLD) {
                // they can be similar
                console.log('SIMILAR FIRSTNAME');
                consideredSimilar = true;
            }
        }

        console.log('CONSIDERED SIMILAR: ', consideredSimilar);

    });
}

const _checkDifferences= (first, second) => {
    const splittedFirst = first.split('');
    const splittedSecond = second.split('');

    const length = splittedFirst.length > splittedSecond.length ? splittedFirst.length : splittedSecond.length;

    let differences = 0;

    for (let index = 0; index < length; index++) {
        const elementFirst = splittedFirst[index];
        const elementSecond = splittedSecond[index];

        if (elementFirst != elementSecond) {
            differences++;
        }
    }

    return differences;
}

compareCollections(array[1]);
Dave
  • 1,912
  • 4
  • 16
  • 34
0

If we talk about duplicate, lets clarify what duplicity is. I might figure out situation when person has a real name - "Joshuaa". In terms of your question it might be some sort of bayesian filter.

As for me, simple convert you array into object with key as lastname (its cheap), and back to array.

const array = [
    { FirstName: 'Joshua', LastName: 'smith' },
    { FirstName: 'Joshuaa', LastName: 'smith' }
];

const test = array.reduce((acc, el) => ({
    ...acc,
  [el.LastName]: {...el}
}), {});
const output = Object.values(test);