0

I have a text which is on a website. I am scanning that page and counting the number of several characters, including spaces caused by a line break or "enter press" and "tabs".

I have found an answer for counting the number of lines and such.

How can I do this in java? Counting whitespace is easy, there's a method for it, but not the line breaks or tabs as far as I know.

The website is this http://homepage.lnu.se/staff/jlnmsi/java1/HistoryOfProgramming.txt and I'm counting uppercase and lowercase letters, as well as spaces of any sort.

So far my output is correct for upper and lowercases but not spaces. I'm missing 15, which is exactly the number of line breaks.

public class CountChar 
{
public static void main(String[] args) throws IOException
{
    int upperCase = 0;
    int lowerCase = 0;
    int whitespace = 0;
    int others = 0;

    String url = "http://homepage.lnu.se/staff/jlnmsi/java1/HistoryOfProgramming.txt";
    URL page = new URL(url);
    Scanner in = new Scanner(page.openStream());
    while (in.hasNextLine())
    {
        whitespace++; // THIS IS THE SOLUTION FOR THOSE WHO COME LATER <<<<<
        String line = in.nextLine();
        for (int i = 0; i < line.length(); i++)
        {
            if (Character.isUpperCase(line.charAt(i)))
            {
                upperCase++;
            }
            else if (Character.isLowerCase(line.charAt(i)))
            {
                lowerCase++;
            }
            else if (Character.isWhitespace(line.charAt(i)))
            {
                whitespace++;
            }               
            else
            {
                others++;
            }
        }
    }
    System.out.print(lowerCase + " " + upperCase + " " + whitespace + " " + others);

}
}
George
  • 1
  • 3

2 Answers2

0

If we assume that your data is stored in a String called data:

String[] arrayOfLines= data.split("\r?\t?\n");
int length=arrayOfLines.length-1;

length would give the number of newline characters in data.

Robin Daugherty
  • 7,115
  • 4
  • 45
  • 59
  • 1
    The pattern `"\r?\n"` should do the same thing – OneCricketeer Feb 03 '17 at 22:37
  • This is wrong because the op's original post said that s/he needs to count tabs as well as lines (which you did well with `\n` and `\r`. In other words, you need to add permutations with `\t`. As per @cricket_007, `"\r?\t?\n"` is the correct answer – PMARINA Feb 03 '17 at 22:57
  • I did not use data, I do not even know how. I'm reading "straight" from the page. Also, would this technique count like purposely broken? It seems it would only count the length of each line of text in the array? Sorry I'm a novice at this. – George Feb 03 '17 at 23:26
  • @George I'm sorry, I don't understand what you're doing. What do you mean by "count like purposely broken"? What this does is make an array that is created from a String. The string is broken into parts of the array based on the characters \r, \t, and \n. \r is carriage return, \n is a newline character, and \t is a tab character. Then, we just find the length of the array to find the number of parts and use that to determine the number of \n, \t or \r's (all three). Note that the post is incorrect in that you'd have to do length-1 because the length gives #parts, not things in between. – PMARINA Feb 04 '17 at 01:12
  • 1
    The string "\r?\t?\n" will match a single possible tab that comes _between_ the optional \r and the \n at the end of a line. It's entirely unclear from the question how tabs fit into the pattern, but this is almost certainly not correct, and it doesn't match with the new explanation you added @PMARINA. – Robin Daugherty Feb 04 '17 at 15:39
  • 1
    To count a character or pattern, use [one of the methods provided in this answer](http://stackoverflow.com/a/35242882/1589422). Using `split` is _wrong_, as explained in the comments on [this answer](http://stackoverflow.com/a/37317254/1589422). – Robin Daugherty Feb 04 '17 at 15:49
0

You can use the Pattern and Matcher classes in the standard library to create a regular expression to search for all the characters you are looking for and count the number of occurrences using find() but don't know if this is more complex than what you require and you could just split the string on all required whitespace characters you need... (similar to Krishna Chikkala's answer)

Community
  • 1
  • 1
maccoda
  • 98
  • 6