0

For a class project, we have to take a string(a paragraph),make it into an array of the individual words, and then make those words into objects of Object Array. The words cannot repeat so I used a Set to only get the unique values but only certain words are repeating! Here is the code for the method. Sorry for the vague description.

Private void processDocument() 
    {
    String r = docReader.getLine();
    lines++;
    while(docReader.hasLines()==true)
    {
        r= r+" " +docReader.getLine();
        lines++;
    }
    r = r.trim();
    String[] linewords = r.split(" ");
    while(linewords.length>words.length)
    {
        this.expandWords();
    }
    String[] newWord = new String[linewords.length];
    for(int i=0;i<linewords.length;i++)
    {

        newWord[i] = (this.stripPunctuation(linewords[i]));
    }

    Set<String> set = new HashSet<String>(Arrays.asList(newWord));
    Object[]newArray = set.toArray();
    words = new Word[set.size()-1];
    String newString = null;
    for(int i =0;i<set.size();i++)
    {
        if(i==0)
        {
            newString = newArray[i].toString() + "";
        }
        else
        {
            newString = newString+newArray[i].toString()+" ";
        }
    }
    newString = newString.trim();
    String[] newWord2 = newString.split(" ");
    for(int j=0;j<set.size()-1;j++)
    {


        Word newWordz = new Word(newWord2[j].toLowerCase());
        words[j] = newWordz;

    }
  • 3
    There's a *lot* of code there - I strongly suspect you don't need nearly that much to demonstrate the problem. It would really help if you'd show a short but complete program demonstrating the problem. – Jon Skeet Nov 22 '13 at 18:57
  • 1
    Can you give examples? One or two lines of input and the words that repeat out of those lines would be helpful. – GregA100k Nov 22 '13 at 19:01

2 Answers2

2

I believe the problem is when you put it into the HashSet the words are capitalized differently, causing the HashCode to be different. Cast everything to lowercase the moment you read it from the file and it should work.

newWord[i] = (this.stripPunctuation(linewords[i])).toLowerCase();
Brinnis
  • 906
  • 5
  • 12
0

Try this:

public String[] unique(String[] array) {
   return new HashSet<String>(Arrays.asList(array)).toArray();
}

Shamelessly copied from Bohemain's answer.

Also, as noted by @Brinnis, make sure that words are trimmed and in the right case.

for(int i = 0; i < linewords.length; i++) {
   newWord[i] = this.stripPunctuation(linewords[i]).toLowerCase(); 
}
String[] newArray = unique(newWord);
Community
  • 1
  • 1
Anthony Accioly
  • 21,918
  • 9
  • 70
  • 118