0

I need to read in a file that contains 2 sentences to compare and return a number between 0 and 1. If the sentences are exactly the same it should return a 1 for true and if they are totally opposite it should return a 0 for false. If the sentences are similar but words are changed to synonyms or something close it should return a .25 .5 or .75. The text file is formatted like this:

______________________________________
Text: Sample 

Text 1: It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats.

Text 20: It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines
// Should score high point but not 1

Text 21: It was a murky and tempestuous night. I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines
// Should score lower than text20

Text 22: I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines. It was a murky and tempestuous night.
// Should score lower than text21 but NOT 0

Text 24: It was a dark and stormy night. I was not alone. I was not sitting on a red chair. I had three cats.
// Should score a 0!
________________________________________________

I have a file reader, but I am not sure the best way to store each line so I can compare them. For now I have the file being read and then being printed out on the screen. What is the best way to store these and then compare them to get my desired number?

import java.io.*;

public class implement 
{


    public static void main(String[] args)
    {
        try
        {
            FileInputStream fstream = new FileInputStream("textfile.txt");

            DataInputStream in = new  DataInputStream (fstream);
            BufferedReader br = new BufferedReader (new InputStreamReader(in));
            String strLine;

            while ((strLine = br.readLine()) != null)
            {
                System.out.println (strLine);
            }

            in.close();
        }

        catch (Exception e)
        {
            System.err.println("Error: " + e.getMessage());
        }

    }

}
skaffman
  • 398,947
  • 96
  • 818
  • 769
mrjeck2
  • 13
  • 1
  • 4
  • 1
    These are completely two different things, please be specific on what you are asking: (1) How to store the data from files? **OR** (2) How can I compare two strings to determine what their score should be? In this case - we should hear your first thoughts and an explanation what you have already tried, and why does it fail. – amit Apr 27 '12 at 14:17
  • 1
    Also, you should avoid naming your class `implement`, mostly for 2 reasons: (1) Give it meaningful name, that describes what it does. (2) the convention in java is that class names start with upper-case letter, and `implement` starts with lower case :\ – amit Apr 27 '12 at 14:21
  • How can I store each line as a string so I can check the equality using == – mrjeck2 Apr 27 '12 at 14:33
  • actually, using `operator==` will check for identity, not equality - you should check for equality using [`equals()`](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#equals%28java.lang.Object%29) – amit Apr 27 '12 at 14:35

1 Answers1

1

Save them in an array list.

ArrayList list = new ArrayList();
//Read File
//While loop
list.add(strLine)

To check each variable in a sentence simply remove punctuation then delimit by spaces and search for each word in the sentence you are comparing. I would suggest ignoring words or 2 or 3 characters. it is up to your digression

then save the lines to the array and compare them however you wanted to. To compare similar words you will need a database to efficiently check words. Aka a hash table. Once you have this you can search words in a database semiquickly. Next this hash table of works will need a thesaurus linked to each word for similar words. Then take the similar words for the key words in each sentence and run a search for these words on the sentence you are comparing. Obviously before you search for the similar words you would want to compare the two actually sentences. In the end you will need an advanced datastucture you will have to build yourself to do more than direct comparisons.

John Sykor
  • 727
  • 5
  • 15
  • What would he do when he reads the 11th line? The size is not known in advance.. – amit Apr 27 '12 at 14:18
  • use an arraylist or some other type of dynamic structure – rflood89 Apr 27 '12 at 14:25
  • Well, you can always use array list or the lines can be counted then read in. – John Sykor Apr 27 '12 at 14:26
  • The best option is always to use a linked list because of the dynamic elements. – John Sykor Apr 27 '12 at 14:27
  • @JohnSykor: That is exactly what I was hinting. There are more appropriate data-structures then an array for this task. Also: I think the OP actually is more interested in "how to compute the score for each string" - but he does not answer my comment. – amit Apr 27 '12 at 14:27
  • Prolly, but I love only answering what they ask in hopes they learn to be more specific in their questions. – John Sykor Apr 27 '12 at 14:32
  • @JohnSykor: But he does ask it: `What is the best way to store these **and then compare them to get my desired number?**`. Also, you should avoid using raw types, and prefer the generic `ArrayList`. – amit Apr 27 '12 at 14:34
  • Off the top of my head, what he intends to do is a lot more indepth than what he can handle lol. – John Sykor Apr 27 '12 at 14:35
  • Ok I will fix up my answer to what will give him an idea :D – John Sykor Apr 27 '12 at 14:36
  • that's a start thanks for the help my last question is how am I supposed to get the .25 .5 .75 scores? – mrjeck2 Apr 27 '12 at 14:46
  • you will need a system. For example. Have a count of the words in an item them the # that match. `int score = matchWords/wordCount; if(score > .8) score = 1; else if(score >.6) score = .75;` etc. There is also an issue with duplicate words. So if you want an accurate result for duplicates you will want to keep a count on them. say word "pie" appears only once but it appears twice in sentence 2. – John Sykor Apr 27 '12 at 14:56
  • Btw, that is just my implementation of rounding. You could always just give the exact matches or just round the .05 on every digit giving a greater accuracy but not an infinite one. Rounding: http://stackoverflow.com/questions/153724/how-to-round-a-number-to-n-decimal-places-in-java – John Sykor Apr 27 '12 at 15:06