0

I have put a general title for a good search, but my question is a bit more specific.

I have got one array:

var keywords= ["Anglo-Saxon English","American English","British English","Canadian English","Fast British English","Austrian German","American Football","British English","Fast British English","Blue Jeep","Hot Summer","Mild Summer","Hot Brazilian Summer"];

and another array:

var terms = ["english","english","english","english","english","german","football","british english","british english","jeep","summer","summer","summer"];

Every keyword corresponds to every term, as every keyword contains respective term in itself. For example, 'Anglo-Saxon English' keyword corresponds to 'english' term, 'American Football' keyword corresponds to 'football' term, etc. However, in keywords array there are some duplicates. In this example duplicate keywords are: 'British English' keywords which corresponds to 'english' term as well as 'british english' term, 'Fast British English' keywords which corresponds to 'english' term as well as 'british english' term. There can be any number of duplicates (I have put 2 for simplicity). All keywords has got their respective terms in the same order in the arrays, therefore length of both arrays is the same.

My question is how to leave only one lement among duplicates which has more specific corresponding term? For instance, 'british english' is more specific term than just 'english' therefore I want to remove 'British English' (keywords[2]) duplicate keyword with its respective term 'english' (terms[2]) from both arrays, and leave only 'British English' (keywords[7]) keyword with its term 'british english' (terms[7]).

UPDATE: With the solution offered by Tobos below, I came up with the working fiddle which is: http://jsfiddle.net/ZqEhQ/. However, here 'Fast British English' and 'British English' still get picked up by 'Languages' instead of being pichked up by 'Car' category which has got more specific 'british english' term versus just 'english' term of 'Languages'category. Any ideas?

Kanan Farzali
  • 991
  • 13
  • 23
  • 2
    Use an associative array – Jacob Jan 16 '14 at 10:31
  • 2
    There are no "associative arrays" in javascript. There are objects – Alma Do Jan 16 '14 at 10:32
  • [google](https://www.google.de/search?q=javascript+array+remove+duplicate+entries) gives plenty of Stackoverflow questions, e.g. http://stackoverflow.com/q/9229645/1741542, http://stackoverflow.com/q/16747798/1741542, ... – Olaf Dietsche Jan 16 '14 at 10:34
  • The title is a little misleading, so you may want to re-name the question to "How to remove duplicates from 2 corresponding arrays", or something like that. – Nahn Jan 16 '14 at 13:16
  • @AlmaDo Isn't everything in Javascript an object? :) – Nahn Jan 16 '14 at 13:22
  • @Nahn well, no. Such things as pure `20` or `75` are definitely not objects – Alma Do Jan 16 '14 at 13:23

4 Answers4

2

Having items depend on one another's order in different arrays is generally a bad idea because it is very difficult to maintain. I would suggest using a different structure for your data:

var data= [
    { keyword : "Anglo-Saxon English", term : 'english', category : 'Language' },
    { keyword : "American English", term : 'english', category : 'Language'  },
    { keyword : "Fast British English", term : 'english', category : 'Sport' },
    { keyword : "British English", term : 'english', category : 'Language' },
    { keyword : "British English", term : 'british english', category : 'Language' },
    { keyword : "Fast British English", term : 'british english', category : 'Sport' },
    { keyword : "Canadian English", term : 'french', category : 'Sport' }
];

Since your final data contains unique keywords, i would use yet another structuring of the data to hold it:

Expected output:

var uniques = {
    "American English": "english"
    "Anglo-Saxon English": "english"
    "British English": "british english"
    "Canadian English": "french"
    "Fast British English": "british english"
}

Some way to get from input to expected output:

var uniques = {};
data.forEach(function(item){
    if (isMoreSpecific(item.term, uniques[item.keyword])) {
        uniques [item.keyword] = item.term;
    }
});

function isMoreSpecific(term, reference) {
    return !reference || term.indexOf(reference) !== -1;
}

You can obviously change the isMoreSpecific function if you don't agree with my definition, or your logic of defining specificity changes. You could even inline it, though i prefer the function for clarity in this case.


Note: the solution above can be quite easily adapted to work with the two arrays you have originally. Simply iterate using a for loop over one array to build the uniques object, then rebuild the arrays from it.


Solution for category inclusion with the keyword:

var uniques = {};
data.forEach(function(item){
    var serialized = JSON.stringify({key:item.keyword, cat:item.category});
    if (isMoreSpecific(item.term, uniques[serialized])) {
        uniques [serialized] = item.term;
    }
});

var keywordcategory = {};
for (var serialized in uniques) {
    var obj = JSON.parse(serialized);
    keywordcategory[obj.key] = obj.cat;
}

DEMO: http://jsbin.com/ODoDIXi/1/edit

If you can assume that the same keyword is only in one category, there is no need for serialization:

var uniques = {};
data.forEach(function(item){
    if (isMoreSpecific(item.term, uniques[item.keyword].term)) {
        uniques [item.keyword] =  { term : item.term; category : item.category };
    }
});

// you can now remove the unnecessary term information from the uniques map and keep just the category:
for (var key in uniques) {
  uniques[key] = uniques[key].category;
}
Tibos
  • 27,507
  • 4
  • 50
  • 64
  • You forced me to change my data structure, but I like your solution. – Kanan Farzali Jan 16 '14 at 11:49
  • It is not working when if we add cat key value to data: var data= [ { keyword : "Anglo-Saxon English", term : 'english', cat : 'Language' }]; where output I excpect is var uniques = {keyword : cat}. – Kanan Farzali Jan 16 '14 at 12:45
  • I'm not entirely sure what your expectations are, but the solution should work with minor adjustments even if you have more than two properties per item. If the extra properties are part of the unique key, simply serialize them together, if they are part of the specificity, change the isMoreSpecific function. – Tibos Jan 16 '14 at 12:52
  • With another array of categories, which is exactly the same size as keywords and terms,because each keyword and its corresponding term has got the corresponding category name. I only need to use this array at the end for the final output. Because my output should be var uniques={keyword:cat}, not var uniques = {keyword : term}, how could I modify the isMoreSpecific function to match the corresponding category. As for now it returns the category 'Language' instead of returning the category 'Car',because 'british english' term is more specific and category name for the more specific term is Car. – Kanan Farzali Jan 16 '14 at 14:09
  • I added the solutions, though it would have been nice if the extra requirements were part of the original question. Now it looks like a patched job (because it actually is one). – Tibos Jan 16 '14 at 14:17
  • I am a bit confused. I don't need to include term in my ouput. Term is used to exclude the duplicate keywords with less matching terms. But then my output should be simply object containing keyword and its corresponding category. – Kanan Farzali Jan 16 '14 at 14:26
  • 1
    @KananFarzali That is exactly how it is used. After it is no longer used, it is removed in the first case by creating another object (keywordcategory for lack of inspiration) and in the second by changing the values in uniques. – Tibos Jan 16 '14 at 14:28
  • one more question: how to make the code to pickup the first more specific term and its corresponding keyword and category if two terms has got the same specificity under different categories. For example, if keyword 'Fast British English' corresponds to the terms 'british' and 'english', I want it to pickup 'british' because it is the first in array. Currently it picks up te most specific term which comes the last. – Kanan Farzali Jan 17 '14 at 15:09
  • `function isMoreSpecific(term, reference) { return !reference || reference.indexOf(term) === -1; }` – Tibos Jan 17 '14 at 15:14
  • actually I realized you code doesn't do the work correct: http://jsfiddle.net/ZqEhQ/ For example, 'Hot Brazilian Summer' gets the category 'Weather' which has got 'hot' term instead of picking up 'Car' category which has got 'Brazilian' term because 'Car' column in my file comes before 'Weather' column. – Kanan Farzali Jan 20 '14 at 09:33
  • + 'Fast British English' and 'British English' keywords picks up the category 'Language' instead of picking up 'Car' category, which has got more specific 'british english' term. – Kanan Farzali Jan 20 '14 at 09:50
  • Good luck! I would gladly solve these problems if this were a freelance site where i would get paid for my work, but this is SO, where i help you solve problems, not where i solve problems for you. – Tibos Jan 20 '14 at 09:54
  • I put your solution into jsfiddle, and run it. It doesn't work. It means you didn't really help me. – Kanan Farzali Jan 20 '14 at 10:08
1

I'm not sure I understood correctly, but still...

Let's start with this small function:

function removeLessSpecific(ary) {
    return ary.filter(function(x) {
        return !ary.some(function(y) {
            return x != y && y.indexOf(x) >= 0;
        });
    });
}

When applied to say

["american football","english","british english","football","german"]

it returns only more specific or "standalone" terms

["american football","british english","german"]

Now let's convert your arrays into a mapping structure:

mapping = {}

keywords.forEach(function(kw, i) {
    mapping[kw] = (mapping[kw] || []);
    mapping[kw].push(terms[i]);
})

The mapping will be like this:

{
     "Anglo-Saxon English":["english"],
     "American English":["english"],
     "British English":["english","british english"], etc

Finally, iterate over the mapping, remove less specific keywords and populate new arrays:

newTerms = [], newKw = []

Object.keys(mapping).forEach(function(term) {
    var kwords = mapping[term];
    removeLessSpecific(kwords).forEach(function(kw) {
        newTerms.push(term);
        newKw.push(kw);
    })
})

http://jsfiddle.net/d9Zq8/1/

As a side note, your naming looks a bit confusing to me. In your example, the first array must be "terms" (=proper names) and the second one - "keywords".

georg
  • 211,518
  • 52
  • 313
  • 390
1

Just as Tibos said, you need to restructure your data. It's not good to have 2 arrays.

var data = [
    {keyword: "Anglo-Saxon English", term: 'english'},
    {keyword: "British English", term: 'english'},
    {keyword: "British English", term: 'british english'},
    {keyword: "Fast British English", term: 'british english'},
    {keyword: "Canadian English", term: 'french'}
];

Add a unique data array:

var uniqueData = [];

STEP 1- Extract all keywords into a uniqueKeywords array

var uniqueKeywords = [];

data.forEach(function(item) {

//if keyword doesn't already exist, push it
if (uniqueKeywords.indexOf(item.keyword) === -1)
       uniqueKeywords.push(item.keyword);
});

STEP 2- For each keyword find all corresponding data objects, and only add most relevant one to uniqueData

var extractMostRelevant = function(array){
     var mostRelevant = array[0];

     array.forEach(function(item){
         if(item !== array[0]){
               if(item.term.length > mostRelevant.term.length)
                     mostRelevant = item;
         }
     });

     return mostRelevant;
 };


uniqueKeywords.forEach(function(keyword){
     var itemsWithCurrentKeyword = [];

     data.forEach(function(item){
           if(keyword === item.keyword)
           itemsWithCurrentKeyword.push(item);
     });

     var mostRelevant = extractMostRelevant(itemsWithCurrentKeyword);
     uniqueData.push(mostRelevant);
});

There you go, now you have 2 arrays: data, and uniqueData

Nahn
  • 3,196
  • 1
  • 24
  • 23
0

Original Array : [1,3,2,1,4,5,6,4,3,5,6,2,3,4,1,4,6,4,10,3,10,"a","a"]

Duplicates removed : [1,10,2,3,4,5,6,"a"]

Array.prototype.removeDuplicates = function (){
  var temp=new Array();
  this.sort();
  for(i=0;i<this.length;i++)
  if(this[i]==this[i+1]) {continue}
  temp[temp.length]=this[i];
  }
  return temp;
  } 

OR

var duplicatesArray = ['mike','shibu','shibu','alex'];

var uniqueArray = duplicatesArray.filter(function(elem, pos) {
    return duplicatesArray.indexOf(elem) == pos;
  }); 
Shibu Thomas
  • 3,148
  • 2
  • 24
  • 26
  • there are two arrays with same length, and OP want to remove duplicate records from one array and also corresponding records from second array with some criteria – A.T. Jan 16 '14 at 10:46