0

I'm working on my assignment which is to read in from a dictionary list, and a paragraph and count the amount of times certain words show up in the paragraph, all while using LinkedLists and BSTs. We have been given the regex command to split apart the paragraph.txt file, the command is "[\\s|\\pPunct]+" which does not work for me, so instead I am using [\\s, ?!]+ however this isn't doing everything which I want it to, and since regex commands are outside the abstract of this course I don't know much about them.

I'm looking for a command which removes all periods, commas, and whitespace. [\\s, ?!]+ does the first two, however if I have this line for example;

..some line here

more text here...

That return line is not removed, I tried to remove it when I added each word into my LinkedList with;

    public static void insertParagraph(String[] strings) {
    for(int i = 0; i < strings.length; i++) {
        if(strings[i] != "" || strings[i] != " " || strings[i] != null)
            paragraph.insertFirst(strings[i].replaceAll("[^a-zA-Z'\\s]","").toLowerCase());
    }
}

However that if statement doesn't work either, does anyone have any suggestions?

Matthew Brzezinski
  • 1,685
  • 4
  • 29
  • 53

1 Answers1

4

Square brackets denote a character class, round brackets a capturing group.

Have a look at the Pattern class to see the predefined character classes.

"[\\s|\\pPunct]+" // wrong
"(\\s|\\p{Punct})+" // correct
jlordo
  • 37,490
  • 6
  • 58
  • 83
  • Thanks, it seems to do the same thing as `[\\s,?!]+`, is there a way to remove the return key, or would you know what it is stored as in an array? – Matthew Brzezinski Jun 28 '13 at 16:52
  • 1
    @user1327636: what do you mean by _remove the return key_, do you mean a line break `\n`? It is part of `\\s`, as you can see in the link I've provided above. – jlordo Jun 28 '13 at 16:54
  • I'm not sure how to explain it, if you look into the OP, there's a line of text, then a blank line after that. How do you remove that line? For instance if I put a space in that line, my if statement `strings[i] != ""` catches it, but otherwise it doesn't. – Matthew Brzezinski Jun 28 '13 at 16:57
  • 1
    @user1327636: Read this **now**: [How do I compare strings in Java?](http://stackoverflow.com/questions/513832/how-do-i-compare-strings-in-java) – jlordo Jun 28 '13 at 17:00
  • You can write `[\\s\\p{Punct}]+` too, or, why not just `\P{L}+` or `P{Alnum}+` that seems safer for this kind of job. – Casimir et Hippolyte Jun 28 '13 at 17:21