splitting a text into words using bufferReader

Question

I have an issue solving a problem. I have to add ONLY words into a treeset (and output the size of treeset) using bufferedReader but the problem is I cannot pass the compilator speed test limit. The text contains only letters and whitespaces (it can be an empty line). I have to find out a new solution but seems not this :

BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
    words = line.split("\\s+");
    for (int i = 0; i < words.length && words[0].length() > 0; i++) {
        text.add(words[i]);
    }
}
System.out.println(text.size());

Is there any other "split" method to use so the compiler use less "time-thinking"?

Not sure you'd want the "`words[0].length() > 0`" condition in the loop guard, as this stops adding anything if the string starts with a space, even if there are words after. Put that as a conditional inside the loop. (And just use a for each loop, no need to faff with array indices). — Andy Turner, Aug 13 '21 at 10:07

vszholobov · Accepted Answer · 2021-08-13T11:09:11.713

In line

words = line.split("\\s+");

you split by regex, which is much slower, than splitting by one char (5 times on my machine). Java split String performances

If the words are exactly separated by only one space, then the solution is simple

words = line.split(" ");

just replace with this line and your code will run faster.

If words can be separated by several spaces, then add such a line after the loop

text.remove("");

and still replace your regex split with 1 char split.

public class Test {
    public static void main(String[] args) throws IOException {
        // string contains 1, 2 and two spaces between 1 and 2. text size should be 2
        String txt = "1  2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1";

        InputStream inpstr = new ByteArrayInputStream(txt.getBytes());

        BufferedReader read = new BufferedReader(new InputStreamReader(inpstr));
        Set<String> text = new TreeSet<>();
        String[] words;
        String line;
        long startTime = System.nanoTime();
        while ((line = read.readLine()) != null) {
            //words = line.split("\\s+"); -- runs 5 times slower
            words = line.split(" ");
            for (int i = 0; i < words.length; i++) {
                text.add(words[i]);
            }
        }
        text.remove("");  // add only if words can be separated with multiple spaces

        long endTime = System.nanoTime();
        System.out.println((endTime - startTime) + " " + text.size());
    }
}

Also you can replace your for loop with

text.addAll(Arrays.asList(words));

score 0 · Answer 2 · answered Aug 13 '21 at 10:20

based on the assumption you provided, I would simply add everything to the set and at the end delete unwanted values from it. This hopefully reduces the time to check for the condition (which is not much really)

BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
  words = line.split("\\s+");
  for(String value: words) {
    text.add(value);
  }
}
text.remove(" ");
text.remove("");
text.remove(null);
System.out.println(text.size());

score 0 · Answer 3 · answered Aug 13 '21 at 11:20

0

You can of course stream your BufferedReader into your TreeSet:

Collection<String> c = read.lines().flatMap(line -> Stream.of(line.split("\\s+")).filter(word -> word.length() > 0)).collect(Collectors.toCollection(TreeSet::new));

answered Aug 13 '21 at 11:20

g00se

3,207
2
5
9

splitting a text into words using bufferReader

3 Answers3