I want to check if a sentence contains a word from a list of words mapped to a category. So i have a class KeyValue.java with words, category names and a method filterCategory to check if it contains the word. Now i have a 10,000 keywords mapped different categories for the text. But the trouble is it is way to slow. Can you suggest some alternate methods to speed up the classification. Thanks for the help.
public class KeyValue {
private String key;
private String value;
public KeyValue(String key, String value) {
this.key = key;
this.value= value;
}
public KeyValue() {
}
public String getKey() {
return key;
}
public void setKey(String key) {
this.key = key;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
Classification.java
class Classification
{
private static List<KeyValue> keyMap = new ArrayList<KeyValue>();
static{
getWordMap();
}
public static List<KeyValue> getWordMap()
{
if(keyMap.size()==0)
{
keyMap.add(new KeyValue("sports","football"));
keyMap.add(new KeyValue("sports","basketball"));
keyMap.add(new KeyValue("sports","olympics"));
keyMap.add(new KeyValue("sports","cricket"));
keyMap.add(new KeyValue("sports","t20"));
}
}
public static KeyValue filterCategory(String filteredText)
{
KeyValue kv = null;
for(KeyValue tkv:keyMap)
{
String value = tkv.getValue();
String lc = filteredText.toLowerCase();
lc = FormatUtil.replaceEnglishSymbolsWithSpace(lc);//remove symbols with space and then normalizes it
String lastWord="";
if(lc.contains(" "))
{
lastWord = lc.substring(lc.lastIndexOf(" ")+1);
if(lc.startsWith(value+" ") || lc.contains(" "+value+" ") || value.equals(lastWord))
{
kv = new KeyValue(tkv.getKey(), tkv.getValue());
break;
}
}
else if(lc.contains(value))
{
kv = new KeyValue(tkv.getKey(), tkv.getValue());
break;
}
}
if(kv==null)
{
return new KeyValue("general","0");
}
else
{
kv.setValue("100");
return kv;
}
}
}