i want to know the efficient way to remove the stop words from huge text corpus. currently my approach is to convert stopword in to regex match the lines of text with regex and remove it.
e.g
String regex ="\\b(?:a|an|the|was|i)\\b\\s*";
String line = "hi this is regex approach of stop word removal";
String lineWithoutStopword = line.replaceAll(regex,"");
Is there and other efficient approach present to remove stopwords from huge corupus.
thanks