3

I have a text corpus, which I have to read, split, sort and perform other operations on it. In the very beginning, when I split it, I see that the Scanner only reads one line. This is the code:

public class CorpusTest {
    public static void processCorpus(Scanner scanner) throws IOException{
        String line="0";
        while (scanner.hasNextLine()) {
            line = scanner.nextLine();
        }

        String[] w = line.replaceAll("[^a-zA-Z\\s]","").toLowerCase().split(" ");
        for (int i = 0; i < w.length; i++) {
            w[i].trim();
            }
        System.out.println("Word" + "\t" + "Frequency");
        System.out.println(Arrays.toString(w));

        }


    public static void main(String [] args) throws IOException{
        File temp = new File("input.txt");
        Scanner scanner = new Scanner(temp);
        CorpusTest.processCorpus(scanner);
        }
    }

I tried adding:

String text = new Scanner( new File("input.txt") ).useDelimiter("\\A").next();

But I get errors because in the method above I am working with an array.

The while loop only reads the last line, which is no good.

nanachan
  • 1,051
  • 1
  • 15
  • 26

2 Answers2

1

I'm not sure what your issue is, and it seems as if you might be trying to make things more difficult than they need to be. Why not simply read your lines in with the Scanner, one at a time, put them into a StringBuilder, and then when the text has been read in, convert to a String and manipulate your String to your hearts content?

Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
  • I have way too many lines to read - it's a big text. (unless I don't understand what you mean) – nanachan Mar 20 '14 at 03:01
  • @nanachan: Then read as many lines as is needed. You are in complete control of your code and how many lines your Scanner reads. I have a strong suspicion that your problem lies elsewhere that this really is an XY problem in disguise. – Hovercraft Full Of Eels Mar 20 '14 at 03:02
  • So if I have 10000 lines, how do I do this? The while loop doesn't seem to work. – nanachan Mar 20 '14 at 03:05
  • 2
    @nanachan: then you need to do some debugging and first find the source of your problem. Again, I suspect that you have an XY problem going on, that your real problem has nothing to do with the issue that you've posted. – Hovercraft Full Of Eels Mar 20 '14 at 03:06
  • Is there any reason that you are specifically using a scanner? If you just need to tokenize the text why don't you use a String tokenizer? If you want to read in the text why don't you use a buffered reader? It seems like you are trying to use one variable to do everything. – j.jerrod.taylor Mar 20 '14 at 03:18
  • @HovercraftFullOfEels I updated my question. Now when I have the while loop, it "works" but reads only the last line. – nanachan Mar 20 '14 at 03:38
  • @j.jerrod.taylor I have to use a scanner in this exercise I am doing. – nanachan Mar 20 '14 at 03:39
  • @nanachan: chit, you're throwing out most all of what you read, simply ignoring it. With each loop of your while loop, any line read previously will be discarded and replaced by a new line. You don't need a debugger, you need common sense. At least store your lines in a StringBuilder. – Hovercraft Full Of Eels Mar 20 '14 at 03:43
  • @ Hovercraft Full Of Eels thank you for the compliments. Some people are beginners. – nanachan Mar 20 '14 at 03:49
1

@user2864740 helped me out with redirecting me to the right source. I used this instead of the loop in the beginning of my code:

String content = new Scanner(new File("input.txt")).useDelimiter("\\Z").next();     
        String[] w = content.replaceAll("[^a-zA-Z\\s]","").replaceAll("\n","").toLowerCase().split(" ");

Now it works.

nanachan
  • 1,051
  • 1
  • 15
  • 26