The reason for temp += s1.nextLine() + "\n"
taking a long time is that you are generating a lot of strings. In fact, for N characters read, you are generating O(N) large strings, and copying O(N^2) characters.
The solution to (just) that would be to append to a StringBuilder
instead of using String
concatenation. However, that's not the real solution here, because the temp
string is not your ultimate goal. Your ultimate goal is to create an array of words.
What you really need to do is to split each line into words, and accumulate the words. But accumulating them directly into an array won't work well ... because arrays cannot be extended. So what I recommend is that you do the following:
- create an
ArrayList<String>
to hold all of the words
- read and split each line into an array of words
- append the words in the array to the list of all words
- when you are finished, use
List.toArray
to produce the final array of words ... or maybe just leave the words in the list, if that is more appropriate.
The final format I need is a string array of every word.
I read this above as meaning that you want a list of all of the words in the file. If a word appears multiple times in the file, it should appear multiple times in the list.
On the other hand, if you want a list of the distinct words in the file, then you should use a Set
rather than a List
to accumulate the words. Depending on what you want to do with the words next, HashSet
, TreeSet
or LinkedHashSet
would be appropriate.