-2

I have this assignment for my university https://cs1331.gitlab.io/fall2018/hw2/hw2-source-model.html. I wrote the code but when I run the program I get this message at the console :

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 2
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3107)
    at java.base/java.lang.String.substring(String.java:1873)
    at homework1.SourceModel.main(SourceModel.java:127)

Here is my code for this assignment with comments :

 package homework1;

import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;


public class SourceModel {

    //initialize variables so they can be accessed everywhere
    private String modelName;
    private int[][] characterCount;
    private double[] rowCount;
    private double[][] probability;

    /**
     * 
     * @param name takes the name of the corpus
     * @param fileName takes the filesName of corpus
     */
    public SourceModel(String name, String fileName) {
        modelName = name;
        characterCount = new int[26][26];
        rowCount = new double[26];
        probability = new double[26][26];
        System.out.println("Training " + name + "model...");

        try {
            Scanner scan = new Scanner(new File(fileName));
            String temp = "";

            //append all of the text
            while (scan.hasNext()) {
                temp += scan.next();
            }

            //only keeps the letters and makes them lowercase
            temp = temp.replaceAll("[^A-Za-z]+", "").toLowerCase();
System.out.println(temp);
            //iterates trough each letter then puts the letters
            //sequence to the respective row and column

            for (int i = 0; i < (temp.length() - 1); i++) {
                char firstLetter = temp.charAt(i);
                char secondLetter = temp.charAt(i + 1);

                //index based on ASCII values
                characterCount[(int) firstLetter - 97][(int) secondLetter - 97]++;
                rowCount[(int) firstLetter - 97]++;
            }

            //calculates the probability by dividing the count
            //by the total counts in each row 
            for (int i = 0; i < probability.length; i++) {
                for (int j = 0; j < probability[i].length; j++) {
                    if (rowCount[i] == 0) {
                        rowCount[i] = 0.01;
                    }
                    probability[i][j] = (((double) characterCount[i][j]) / rowCount[i]);

                    if (probability[i][j] == 0) {
                        probability[i][j] = 0.01;
                    }
                }
            }
            System.out.println("done");

        } 
        catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }

    /**
     * 
     * @return a string which contains the name
     */
    public String getName() {
        return modelName;
    }

    /**
     * @return a string with the matrix 
     */
    public String toString() {
        String matrix = "";
        matrix += "";
        for (int i = 97; i < 123; i++) {
            matrix += "  ";
            matrix += (char) i;
        }
        matrix += ("\n");
        for (int i = 0; i < probability.length; i++) {
            matrix += ((char) (i + 97) + " ");
            for (int j = 0; j < probability[i].length; j++) {
                matrix += String.format("%.2f", probability[i][j]);
                matrix += ("");
            }
            matrix += "\n";
        }
        return matrix;
    }

    /**
     * 
     * @param test a set of letters to test
     * @return the probability for the word 
     */
    public double probability(String test) {
        test = test.replaceAll("[^A-Za-z]+", "").toLowerCase();
        double stringProbability = 1.0;
        for (int i = 0; i < test.length() - 1; i++) {
            int firstIndex = (int) (test.charAt(i)) - 97;
            int secondIndex = (int) (test.charAt(i + 1)) - 97;
            stringProbability *= probability[firstIndex][secondIndex];
        }
        return stringProbability;
    }

    /**
     * 
     * @param args the command line arguments 
     */
    public static void main(String[] args) {
        SourceModel[] models = new SourceModel[args.length - 1];
        for (int i = 0; i < args.length - 1; i++) {
            models[i] = new SourceModel(args[i].substring(0, args[i].indexOf(".")), args[i]);
        }
        System.out.println("Analyzing: " + args[args.length - 1]);
        double[] normalizedProbability = new double[args.length - 1];
        double sumProbability = 0;
        for (int i = 0; i < args.length - 1; i++) {
            sumProbability += models[i].probability(args[args.length - 1]);
        }
        //normalize the probability in respect to the values given
        for (int i = 0; i < normalizedProbability.length; i++) {
            normalizedProbability[i] = models[i].probability(args[args.length - 1]) / sumProbability;
        }
        int highestIndex = 0;
        for (int i = 0; i < args.length - 1; i++) {
            System.out.print("Probability that test string is");
            System.out.printf("%9s: ", models[i].getName());
            System.out.printf("%.2f", normalizedProbability[i]);
            System.out.println("");
            if (normalizedProbability[i] > normalizedProbability[highestIndex]) {
                highestIndex = i;
            }
        }
        System.out.println("Test string is most likely " + models[highestIndex].getName() + ".");
    }
}
halfer
  • 19,824
  • 17
  • 99
  • 186
  • please include your code – elbraulio Jan 30 '19 at 18:34
  • So "HipHop" and "Lisp" are languages now? lol – Zephyr Jan 30 '19 at 18:36
  • `substring(0, args[i].indexOf("."))` what if `args[i]` doesn't have `.`? `indexOf(".")` would return `-1` which is invalid value for `substring` as you see in exception message. – Pshemo Jan 30 '19 at 18:41
  • @Pshemo Beat me to it, I was just writing an answer pointing this out. Should this be answered or marked as a duplicate? – EJoshuaS - Stand with Ukraine Jan 30 '19 at 18:43
  • @EJoshuaS I am looking for duplicate. Ping me if you find it faster so I could vote on it. – Pshemo Jan 30 '19 at 18:46
  • @Pshemo Maybe [this one](https://stackoverflow.com/questions/13714446/java-indexof-returns-1)? I assume it would be pointless for both of us to mark this as a duplicate since you already wield Mjolnir for Java. – EJoshuaS - Stand with Ukraine Jan 30 '19 at 18:47
  • @EJoshuaS I am not sure. IMO it should contain explanation of what this exception means in relation to substring method (maybe it is faster to answer it...) – Pshemo Jan 30 '19 at 18:51
  • @Pshemo Looking at the text of the assignment again, I'm actually not convinced that that's the root cause of the problem. The assignment states that all of the file names should have a `.` in them, so this shouldn't be happening in the first place - there must be a problem with one of the command line parameters. – EJoshuaS - Stand with Ukraine Jan 30 '19 at 19:01
  • I'm actually going to vote +1 on this because the root cause of the bug ends up being non-obvious. – EJoshuaS - Stand with Ukraine Jan 30 '19 at 19:32
  • Out of curiosity, why remove the code from the question? Granted, the problem ends up being the command line parameters, but if anything I'd leave the code in and add the sample of the command line parameters so that the answers make sense if anyone runs into a similar problem in the future. – EJoshuaS - Stand with Ukraine Jan 30 '19 at 20:11

2 Answers2

2

Others have already pointed this out, but for this line:

models[i] = new SourceModel(args[i].substring(0, args[i].indexOf(".")), args[i]);

the substring method is apparently causing the problem because indexOf returns -1 if the . isn't found.

In this case, though, the code actually isn't the problem, since the assignment states that you can assume that the file names are of the form <source-name>.corpus. That being said, really, all of the command line parameters should have a . in them, so this shouldn't be happening.

I'd check to see what command line parameters you're passing. One guess I have is that you might have a file name with a space in it or something. For example, if you passed English GB.corpus, then this would show up as 2 separate arguments (one of which doesn't have a .).

Edit: As @Pshemo pointed out in the comments, if you have a file name that has a space in it, you can just put it in quotes so that it'll be interpreted as a single command line parameter - for example, instead of English GB.corpus, write "English GB.corpus". That'll prevent the exception.

0

In your main method, you have:

args[i].indexOf(".")

The dot (.) is not found so it returns -1.

You try to create a substring:

models[i] = new SourceModel(args[i].substring(0, args[i].indexOf(".")), args[i]);

But since args[i].indexOf(".") is invalid, it throws an exception.

What you can do is check if the dot (.) exists, if yes continue:

if(args[i].contains(".")){
models[i] = new SourceModel(args[i].substring(0, args[i].indexOf(".")), args[i]);
}
MevlütÖzdemir
  • 3,180
  • 1
  • 23
  • 28
  • Are you sure? Can you print the args[i] to the console and show us the output? – MevlütÖzdemir Jan 30 '19 at 19:00
  • @GabrielaI.Haras "but it even has a dot" what makes you think so? Please notice that with `for (int i = 0; i < args.length - 1; i++)` you are iterating and testing all arguments except last one. Is dot present at all of those arguments? – Pshemo Jan 30 '19 at 19:02