In line
words = line.split("\\s+");
you split by regex, which is much slower, than splitting by one char (5 times on my machine).
Java split String performances
If the words are exactly separated by only one space, then the solution is simple
words = line.split(" ");
just replace with this line and your code will run faster.
If words can be separated by several spaces, then add such a line after the loop
text.remove("");
and still replace your regex split with 1 char split.
public class Test {
public static void main(String[] args) throws IOException {
// string contains 1, 2 and two spaces between 1 and 2. text size should be 2
String txt = "1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1";
InputStream inpstr = new ByteArrayInputStream(txt.getBytes());
BufferedReader read = new BufferedReader(new InputStreamReader(inpstr));
Set<String> text = new TreeSet<>();
String[] words;
String line;
long startTime = System.nanoTime();
while ((line = read.readLine()) != null) {
//words = line.split("\\s+"); -- runs 5 times slower
words = line.split(" ");
for (int i = 0; i < words.length; i++) {
text.add(words[i]);
}
}
text.remove(""); // add only if words can be separated with multiple spaces
long endTime = System.nanoTime();
System.out.println((endTime - startTime) + " " + text.size());
}
}
Also you can replace your for loop
with
text.addAll(Arrays.asList(words));