I'm writing something that takes a block of text and breaks it down into possible database queries that could be used to find similar blocks of text. (something similar to the "similar questions" list being generated while I type this) The basic process:
- Remove stop words from text
- Remove special characters
- From remaining text create an array of unique "stems"
- Create an array of possible combinations of the array of stems (where I'm stuck... kind of)
Here's what I have so far:
//baseList starts with an empty array
//candList starts with the array of unique stems
//target is where the arrays of unique combinations are stored
function createUniqueCombos(baseList,candList,target){
for(var i=0;i<candList.length;i++){
//copy the base List
var newList = baseList.slice(0);
//add the candidate list item to the base list copy
newList.push(candList[i]);
//add the new array to the target array
target.push(newList);
//re-call function using new array as baseList
//and remaining candidates as candList
var nextCandList = candList.slice(i + 1);
createUniqueCombos(newList,nextCandList,target);
}
}
This works, but on blocks of text larger than 25 words or so, it crashes my browser. I realize that mathematically there could be a huge number of possible combinations. What I'd like to know is:
- Is there a more efficient way to do this?
- How could I define a min/max combination array length?