0

I have a file with several hundreds of stopwords. I want to be able to check if the file has been modified by a user for example or even if it is corrupted.

The way I am thinking of doing it currently is by looking if the number of lines is correct. I could also check if the total number of characters is the one expected or even have the whole stopwords list loaded in memory to check if every single one of them is in the file. All 3 of the ways I thought of seem inefficient and/or bad so I thought of asking if there is any better way of doing it.

What I am thinking of implementing:

private static final int WORD_COUNT = 354;

    public static boolean stopwordsCorrupted(File file) {
        int numOfLines = countLines(file);

        return WORD_COUNT != numOfLines;
    }
Aki K
  • 1,222
  • 1
  • 27
  • 49
  • 3
    Check out this: http://en.wikipedia.org/wiki/Checksum This uses the hashfuntion of the file to check if no alterations have been made – wastl May 24 '14 at 14:22
  • Your suggested method will not work if someone modified your list; it only checks if someone added or removed a word. Try using a checksum. – Jongware May 24 '14 at 14:22
  • Why don't you just compute the hash of the file, and compare against a reference? – Oliver Charlesworth May 24 '14 at 14:23
  • In line with @wastl's comment, check this: http://stackoverflow.com/questions/304268/getting-a-files-md5-checksum-in-java – Pablo Lozano May 24 '14 at 14:28
  • @wastl I have no idea why this did not come in my mind...So I basically find the checksum, keep it in memory and then just compare it. By the way you should add this as an answer. – Aki K May 24 '14 at 14:48
  • @wastl Also in order to keep it in memory do I have to keep the byte array as is, therefore going to the debugger and writing down every element of the array? – Aki K May 24 '14 at 15:08
  • Well basically yes, to know your checksum in the first place you would need to do that, or somtehing similar – wastl May 24 '14 at 15:45

2 Answers2

1

Java WatchService API might be helpful for your problem.

Juvanis
  • 25,802
  • 5
  • 69
  • 87
  • Unfortunately I am not that interested in live polling of a file or directory, more like an one time check. – Aki K May 24 '14 at 14:49
1

Check out this: http://en.wikipedia.org/wiki/Checksum This uses the hashfuntion of the file to check if no alterations have been made

Here you also have an example of how to use it.

wastl
  • 2,643
  • 14
  • 27