11

Take the following string as an example:

var string = "spanner, span, spaniel, span";

From this string I would like to find the duplicate words, remove all the duplicates keeping one occurrence of the word in place and then output the revised string.

Which in this example would be:

var string = "spanner, span, spaniel";

I've setup a jsFiddle for testing: http://jsfiddle.net/p2Gqc/

Note that the order of the words in the string is not consistent, neither is the length of each string so a regex isn't going to do the job here I don't think. I'm thinking something along the lines of splitting the string into an array? But I'd like it to be as light on the client as possible and super speedy...

PSL
  • 123,204
  • 21
  • 253
  • 243
CLiown
  • 13,665
  • 48
  • 124
  • 205
  • Nice fiddle, but there's actually no logic behind it... Have a look into [String.split()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split?redirectlocale=en-US&redirectslug=JavaScript%2FReference%2FGlobal_Objects%2FString%2Fsplit). You can then loop through the array of words and check for duplicates. – MCL May 30 '13 at 19:09
  • 1
    a) build an array from your string. b) iterate over the array and append each element to a new array if that element is not in the new array. c) convert the new array to a string. – j08691 May 30 '13 at 19:10
  • I got some question. Is the perfomance an important point? How long can a string be (max)? Do you want to remove ALL duplicated word or just de first one it found? – Karl-André Gagnon May 30 '13 at 19:10

11 Answers11

46

How about something like this?

split the string, get the array, filter it to remove duplicate items, join them back.

var uniqueList=string.split(',').filter(function(item,i,allItems){
    return i==allItems.indexOf(item);
}).join(',');

$('#output').append(uniqueList);

Fiddle

For non supporting browsers you can tackle it by adding this in your js.

See Filter

if (!Array.prototype.filter)
{
  Array.prototype.filter = function(fun /*, thisp*/)
  {
    "use strict";

    if (this == null)
      throw new TypeError();

    var t = Object(this);
    var len = t.length >>> 0;
    if (typeof fun != "function")
      throw new TypeError();

    var res = [];
    var thisp = arguments[1];
    for (var i = 0; i < len; i++)
    {
      if (i in t)
      {
        var val = t[i]; // in case fun mutates this
        if (fun.call(thisp, val, i, t))
          res.push(val);
      }
    }

    return res;
  };
}
PSL
  • 123,204
  • 21
  • 253
  • 243
  • 3
    You may want to trim your strings. This fails var string = "spanner,span, spaniel, span"; – Ceres May 30 '13 at 19:12
  • @PSL the parametrs which you passing to function(item,i,allItems), can you explain how it works when function returns "return i==allItems.indexOf(item)". As u understood, allitems is the whole string, items is each separate item which is passed to the string, but what is "i"? – Viktor Mar 28 '16 at 13:01
  • `i` is the index. indexOf returns the first item matched from the list `allItems`. so checking this will return false for the duplicate items and subsequently excluded from the filtered list. – PSL Mar 28 '16 at 13:06
  • 1
    Can make a nice job when you search to remove duplicated lines with node js readfileasync with a split on '\n'. – manu Jun 16 '16 at 19:41
3

If non of the above works for you here is another way:

var str = "spanner, span, spaniel, span";
str = str.replace(/[ ]/g,"").split(",");
var result = [];
for(var i =0; i < str.length ; i++){
    if(result.indexOf(str[i]) == -1) result.push(str[i]);
}
result=result.join(", ");

Or if you want it to be in a better shape try this:

Array.prototype.removeDuplicate = function(){
   var result = [];
   for(var i =0; i < this.length ; i++){
       if(result.indexOf(this[i]) == -1) result.push(this[i]);
   }
   return result;
}
var str = "spanner, span, spaniel, span";
str = str.replace(/[ ]/g,"").split(",").removeDuplicate().join(", ");
Hirad Nikoo
  • 1,599
  • 16
  • 26
  • I'm using a company version of ie11 (various forced compatibility shenanigans) and this was the only solution that worked. Thank you for posting it :) – elboffor Jul 01 '16 at 01:15
2

Alternate Solution using Regular Expression

By making use of positive lookahead, you can strip off all the duplicate words.

Regex /(\b\S+\b)(?=.*\1)/ig, where

  • \b - matches word boundary
  • \S - matches character that is not white space(tabs, line breaks,etc)
  • ?= - used for positive lookahead
  • ig - flags for in-casesensitive,global search respectively
  • +,* - quantifiers. + -> 1 or more, * -> 0 or more
  • () - define a group
  • \1 - back-reference to the results of the previous group

var string1 = 'spanner, span, spaniel, span';
var string2 = 'spanner, span, spaniel, span, span';
var string3 = 'What, the, the, heck';
// modified regex to remove preceding ',' and ' ' as per your scenario 
var result1 = string1.replace(/(\b, \w+\b)(?=.*\1)/ig, '');
var result2 = string2.replace(/(\b, \w+\b)(?=.*\1)/ig, '');
var result3 = string3.replace(/(\b, \w+\b)(?=.*\1)/ig, '');
console.log(string1 + ' => ' + result1);
console.log(string2 + ' => ' + result2);
console.log(string3 + ' => ' + result3);

The only caveat is that this regex keeps only the last instance of a found duplicate word and strips off all the rest. To those who care only about duplicates and not about the order of the words, this should work!

Community
  • 1
  • 1
Niket Pathak
  • 6,323
  • 1
  • 39
  • 51
2

In getUniqueWordString function, we are filtering redundant words and then joining back with delimiter. Added one case also if in Input string words exist in Upper and lower case both.

function getUniqueWordString(str, delimiter) {
    return str.toLowerCase().split(delimiter).filter(function(e, i, arr) {
        return arr.indexOf(e, i+1) === -1;
    }).join(delimiter);
}

let str = "spanner, span, spaniel, span, SPAN, SpaNiel";
console.log(getUniqueWordString(str, ", "))
2

modern approach using Set

let string = "spanner, span, spaniel, span";

let unique = [...new Set(string.split(", ")];

console.log(unique);
1
// Take the following string
var string = "spanner, span, spaniel, span";
var arr = string.split(", ");
var unique = [];
$.each(arr, function (index,word) {
    if ($.inArray(word, unique) === -1) 
        unique.push(word);

});

alert(unique);

Live DEMO

gdoron
  • 147,333
  • 58
  • 291
  • 367
1

Both the other answers would work fine, although the filter array method used by PSL was added in ECMAScript 5 and won't be available in old browsers.

If you are handling long strings then using $.inArray/Array.indexOf isn't the most efficient way of checking if you've seen an item before (it would involve scanning the whole array each time). Instead you could store each word as a key in an object and take advantage of hash-based look-ups which will be much faster than reading through a large array.

var tmp={};
var arrOut=[];
$.each(string.split(', '), function(_,word){
    if (!(word in tmp)){
        tmp[word]=1;
        arrOut.push(word);
    }
});
arrOut.join(', ');
codebox
  • 19,927
  • 9
  • 63
  • 81
1
<script type="text/javascript">
str=prompt("Enter String::","");
arr=new Array();
arr=str.split(",");
unique=new Array();
for(i=0;i<arr.length;i++)
{
    if((i==arr.indexOf(arr[i]))||(arr.indexOf(arr[i])==arr.lastIndexOf(arr[i])))
        unique.push(arr[i]);   
}
unique.join(",");
alert(unique);
</script>

this code block will remove duplicate words from a sentence.

the first condition of if statement i.e (i==arr.indexOf(arr[i])) will include the first occurence of a repeating word to the result(variale unique in this code).

the second condition (arr.indexOf(arr[i])==arr.lastIndexOf(arr[i])) will include all non repeating words.

1

below is an easy to understand and quick code to remove duplicate words in a string:

var string = "spanner, span, spaniel, span";


var uniqueListIndex=string.split(',').filter(function(currentItem,i,allItems){
    return (i == allItems.indexOf(currentItem));
});

var uniqueList=uniqueListIndex.join(',');

alert(uniqueList);//Result:spanner, span, spaniel

As simple as this can solve your problem. Hope this helps. Cheers :)

praveenak
  • 400
  • 5
  • 7
1

To delete all duplicate words, I use this code:

<script>
function deleteDuplicate(a){a=a.toString().replace(/ /g,",");a=a.replace(/[ ]/g,"").split(",");for(var b=[],c=0;c<a.length;c++)-1==b.indexOf(a[c])&&b.push(a[c]);b=b.join(", ");return b=b.replace(/,/g," ")};
document.write(deleteDuplicate("g g g g"));
</script>
anmml
  • 253
  • 3
  • 8
-1
var string = "spanner, span, spaniel, span";

var strArray= string.split(",");

var unique = [];
 for(var i =0; i< strArray.length; i++)
 {
   eval(unique[strArray] = new Object()); 
 }

//You can easily traverse the unique through foreach.

I like this for three reason. First, it works with IE8 or any other browser.

Second. it is more optimized and guaranteed to have unique result.

Last, It works for Other String array which has White space in their inputs like

var string[] = {"New York", "New Jersey", "South Hampsire","New York"};

for the above case there will be only three elements in the string[] which would be uniquely stored.

Praveen Kumar
  • 190
  • 5
  • 15