0

I have a big list<String> that is about 50,000 records. I want an effective way to search for a specific subString in that List and get the Strings that contains that subString.

My code is like this so far:

List<String> result = new ArrayList<>();
if (aCondition) {
 for (String file : arg) {
   if (file.toLowerCase().contains(tag.toLowerCase())) {
     result.add(file);
    }
  }
} 
return result;
GhostCat
  • 137,827
  • 25
  • 176
  • 248
SemyColon.Me
  • 378
  • 3
  • 18
  • See the second answer here: https://stackoverflow.com/questions/18340097/what-is-the-fastest-substring-search-method-in-java for some explanations. – hamena314 Jun 27 '17 at 10:33

3 Answers3

1

It depends on what you mean by effective.

If you want to get to "minimal" CPU usage, then there isn't much you can do: you have to iterate that list; and compare all entries. The only obvious thing to not do: call tag.toLowerCase() for each loop body. Just compute that value once before entering the loop!

If you care about getting result in less time, the answer is simple: use multiple threads, and have each thread search a "slice" of the overall list (of course, that can turn complicated quickly, as you now have to preserve order and other subtle things).

Finally: you might want to look into tools such ElasticSearch - as there are various products designed to exactly do that: search huge amounts of text.

GhostCat
  • 137,827
  • 25
  • 176
  • 248
  • i want to do this on a android device so it can't use huge amount of ram or cpu but i need to be quick as much as possible – SemyColon.Me Jun 27 '17 at 10:34
  • As mentioned by GhostCat you may look for specialized products: Also [Apache Lucene](https://lucene.apache.org/core/) should be an applicable solution (Elasticsearch is based on Lucene). Against Elasticsearch, this should be more lightweight and you may benefit from its inverted index-approach. You may check this [thread](https://stackoverflow.com/questions/7821103/lucene-in-android) as well - it mentions full text search in [SQLite](http://www.sqlite.org/fts3.html) – chocomuesli Jun 27 '17 at 10:44
0

Consider to use a SQL database to hold big amounts of data.

In this way you can use a simple query to get a result String containing a substring (look at example below). Furthermore your memory will be free of that amount of data loaded in list.

e.g.

SELECT * from word_list_table WHERE word LIKE'%substring%'
Giacomo Lai
  • 494
  • 3
  • 15
0

If your processor has more than one core just go and use parallel streams.

List<String> result = lines.parallelStream() //convert list to parallel stream
            .filter(line -> file.toLowerCase().contains(tag.toLowerCase()))    // check your condition 
            .collect(Collectors.toList());     // collect output    

The above code will process your strings faster if your processor has more than one core because a parallel stream is opened.