0

I am new to java. I need to count word associations with each other in a sentence. For example, for the sentence, "Dog is a Dog and Cat is a Cat", the final association count will be- The first row: Dog-Dog(0), Dog-is(2), Dog-a(2) Dog-and(1), Dog-Cat(2)

and so on.

It is kind of developing an association matrix. Any suggestion on how that can be developed?

  • 1
    Interesting! Can you elaborate what is the use of this and also why is the count 2 for "dog-is". See if this process helps: http://it.toolbox.com/blogs/enterprise-solutions/building-an-association-matrix-15499 – Aravind Yarram Dec 19 '10 at 00:39
  • @Pangea: Well, in the sentence "Dog" is present with 2 "is", that is why Dog-is pair will get the value 2. Making the matrix with a table is easy, but during implementation, I am lost. – Rushdi Shams Dec 19 '10 at 06:50
  • I am sorry but I see the "dog is" occurring only once right. "Dog is a Dog and Cat is a Cat" – Aravind Yarram Dec 19 '10 at 11:59

2 Answers2

3

Thanks Roman. I can split the words from the sentences-

String sentence=null;
    String target="Dog is a Dog and Cat is a Cat";
    int index = 0;
    Locale currentLocale = new Locale ("en","US");
    BreakIterator wordIterator = BreakIterator.getWordInstance(currentLocale);
    //Creating the sentence iterator
    BreakIterator bi = BreakIterator.getSentenceInstance();
    bi.setText(target);

    while (bi.next() != BreakIterator.DONE) {

        sentence = target.substring(index, bi.current());
        System.out.println(sentence);
        wordIterator.setText(sentence);
        int start = wordIterator.first();
        int end = wordIterator.next();

        while (end!=BreakIterator.DONE){

            String word = sentence.substring(start,end);
             if (Character.isLetterOrDigit(word.charAt(0))) {

                System.out.println(word);

             }//if (Character.isLetterOrDigit(word.charAt(0)))

             start = end;
             end = wordIterator.next();
        }//while (end!=BreakIterator.DONE)
        index = bi.current();
    }  //  while (bi.next() != BreakIterator.DONE)

But did not get your other two points. Thanks.

0
  1. Split the sentence into separate words.
  2. Generate pairs.
  3. Merge the same pairs.

It's as simple as:

String[] words = sentence.split("\\s"); //first step
List<List<String>> pairs = 
    new ArrayList<List<String>>((int)(((words.length) / 2.0) * (words.length - 1)));
for (int i = 0; i < words.length - 1; i++) {
    for (int j = i + 1; j < words.length; j++) {
         List<String> pair = Arrays.asList(words[i], words[j]);
         Collections.sort(pair);
         pairs.add(pair);
    }
} //second step
Map<List<String>, Integer> pair2count = new LinkedHashMap<List<String>, Integer>();
for (List<String> pair : pairs) {
    if (pair2count.containsKey(pair)) {
        pair2count.put(pair, pair2count.get(pair) + 1);
    } else {
        pair2count.put(pair, 1);
    }
} //third step

//output
System.out.println(pair2count);
Roman
  • 64,384
  • 92
  • 238
  • 332