0

I want to write a method in java which gets an input string and lists the words (case insensitive) that are occurred more than once featuring their count numbers.

for example:

input>> "I have two cars in my garage and my dad has one car in his garage"

it should produce the following output:

output>> my -- repeated 2 times
         in -- repeated 2 times
         ...

here is my code

public class classduplicate {
    private static final String REGEX = "\\b([A-Z]+)\\s+\\1\\b*";
   private static final String INPUT = "Cat cat cat cattie cat";
    public static void main(String[] args) {
      Pattern p = Pattern.compile(REGEX);
      Matcher m = p.matcher(INPUT);   // get a matcher object
      int count = 0;

      while(m.find()) {
         count++;
                 System.out.println(m.find());


      }
      System.out.println("Match number "+count);
    }
}
roxch
  • 351
  • 3
  • 16

2 Answers2

1

I don't think that you can solve this problem with regex.
This is a solution by using a Set:

    String str = " I have two cars   in my garage and my dad has one   car in his garage ";
    System.out.println(str);

    String low = str.trim().toLowerCase();

    String[] words = low.split("\\s+");

    Set<String> setOfWords = new HashSet<String>(Arrays.asList(words));

    low = " " + str.toLowerCase() + " ";
    low = low.replaceAll("\\s", "  ");

    for (String s : setOfWords) {
        String without = low.replaceAll(" " + s + " ", "");
        int counter = (low.length() - without.length()) / (s.length() + 2);
        if (counter > 1)
            System.out.println(s + " repeated " + counter + " times.");
    }

it will print

 I have two cars   in my garage and my dad has one   car in his garage 
in repeated 2 times.
garage repeated 2 times.
my repeated 2 times.
forpas
  • 160,666
  • 10
  • 38
  • 76
0

You can find the duplicate words as shown code below:

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package duplicatewords;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


/**
 *
 * @author sami
 */
public class DuplicateWords {

    private static final String INPUT = "Cat cat cat cattie cat";

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        List<String> wordsWithCase = DuplicateWords(INPUT);
        List<String> wordsWithoutCase = DuplicateWordsDespiteOfCase(INPUT);
        CountDuplicateWords(INPUT, wordsWithCase);
        CountDuplicateWords(INPUT, wordsWithoutCase);
    }

    /**
     * Find the duplicate words with regards of upper and lower case
     * @param inputValue Input String
     * @return duplicateWords List of the words which are duplicated in the string.
     */
    private static List<String> DuplicateWords(String inputValue) {
        String[] breakWords = inputValue.split("\\s+");
        List<String> duplicateWords = new ArrayList<>();
        for (String word : breakWords) {
            if (!duplicateWords.contains(word)) {
                duplicateWords.add(word);
            }
        }
        return duplicateWords;
    }

    /**
     * Find the duplicate words despite of upper and lower case
     * @param inputValue Input String
     * @return duplicateWords List of the words which are duplicated in the string.
     */
    private static List<String> DuplicateWordsDespiteOfCase(String inputValue) {
        inputValue = inputValue.toLowerCase();
        String[] breakWords = inputValue.split("\\s+");
        List<String> duplicateWords = new ArrayList<>();
        for (String word : breakWords) {
            if (!duplicateWords.contains(word)) {
                duplicateWords.add(word);
            }
        }
        return duplicateWords;
    }

    /**
     * Found the Iteration of the the duplicated words in the string
     * @param inputValue Input String
     * @param duplicatedWords List of the duplicated words
     */
    private static void CountDuplicateWords(String inputValue, List<String> duplicatedWords) {
        int i;
        Pattern pattern;
        Matcher matcher;
        System.out.println("Duplicate Words With respect of Upper and Lower Case: " + duplicatedWords);
        for (String value : duplicatedWords) {
            i = 0;
            pattern = Pattern.compile(value);
            matcher = pattern.matcher(inputValue);
            while (matcher.find()) {
                i++;
            }
            System.out.println(i);
        }
    }
}

DuplicateWords method get all the words which are duplicate in the string with regards of upper and lower case, DuplicateWordsDespiteOfCase method get all the words which are duplicate in the string despite of upper and lower case (What you have mentioned in your question). Once the duplicate words then CountDuplicateWords checks their occurrence in the string.

You can remove the DuplicateWords methods if you don't want to use. That's just for your reference.

Sami Ahmed Siddiqui
  • 2,328
  • 1
  • 16
  • 29
  • Try to use set or map. Your code has a complexity of `O(n^2)`. – nice_dev Oct 28 '18 at 18:56
  • Beautifully abstracted and written and documented. But there are many shortcuts in Java language to reach what you did. Too much is also not good from the other side. For example using `HashSet` as collection which doesn't allow duplicatesbto be add and more... – roxch Oct 30 '18 at 07:38