3

I have a text file that is sorted alphabetically, with around 94,000 lines of names (one name per line, text only, no punctuation.

Example:

Alice

Bob

Simon

Simon

Tom

Each line takes the same form, first letter is capitalized, no accented letters.

My code:

try{
        BufferedReader br = new BufferedReader(new FileReader("orderedNames.txt"));
        PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("sortedNoDuplicateNames.txt", true)));

        ArrayList<String> textToTransfer = new ArrayList();


        String previousLine = "";
        String current = "";

        //Load first line into previous line
        previousLine = br.readLine();

        //Add first line to the transfer list
        textToTransfer.add(previousLine);


        while((current = br.readLine()) != previousLine && current != null){

            textToTransfer.add(current);
            previousLine = current;
        }
        int index = 0;
        for(int i=0; i<textToTransfer.size(); i++){
            out.println(textToTransfer.get(i));
            System.out.println(textToTransfer.get(i));
            index ++;

        }
        System.out.println(index);

}catch(Exception e){
    e.printStackTrace();
}

From what I understand is that, the first line of the file is being read and loaded into the previousLine variable like I intended, current is being set to the second line of the file we're reading from, current is then compared against the previous line and null, if it's not the same as the last line and it's not null, we add it to the array-list.

previousLine is then set to currents value so the next readLine for current can replace the current 'current' value to continue comparing in the while loop.

I cannot see what is wrong with this. If a duplicate is found, surely the loop should break?

Sorry in advance when it turns out to be something stupid.

Tom O.
  • 5,730
  • 2
  • 21
  • 35
  • 1
    `!(current = br.readLine()).equals(previousLine)` – Pavneet_Singh Aug 22 '17 at 17:17
  • 1
    `List` doesnt sound like the right data structure for this problem. I think you want to use some implementation of a `Set` because they do not store duplicates as a `List` will. Always good to think through your choice of data structure instead of arbitrarily deciding an `ArrayList` is best. [Check out this SO question for details](https://stackoverflow.com/questions/1035008/what-is-the-difference-between-set-and-list) – Tom O. Aug 22 '17 at 17:23

3 Answers3

4

Use a TreeSet instead of an ArrayList.

Set<String> textToTransfer = new TreeSet<>();

The TreeSet is sorted and does not allow duplicates.

Juan Carlos Mendoza
  • 5,736
  • 7
  • 25
  • 50
2

Don't reinvent the wheel!

If you don't want duplicates, you should consider using a Collection that doesn't allows duplicates. The easiest way to remove repeated elements is to add the contents to a Set which will not allow duplicates:

import java.util.*;
import java.util.stream.*;

public class RemoveDups {
    public static void main(String[] args) {
        Set<String> dist = Arrays.asList(args).stream().collect(Collectors.toSet()); 
    }
}

Another way is to remove duplicates from text file before reading the file by the Java code, in Linux for example (far quicker than do it in Java code):

sort myFileWithDuplicates.txt | uniq -u > myFileWithoutDuplicates.txt
1ac0
  • 2,875
  • 3
  • 33
  • 47
1

While, like the others, I recommend using a collection object that does not allow repeated entries into the collection, I think I can identify for you what is wrong with your function. The method in which you are trying to compare strings (which is what you are trying to do, of course) in your While loop is incorrect in Java. The == (and its counterpart) are used to determine if two objects are the same, which is not the same as determining if their values are the same. Luckily, Java's String class has a static string comparison method in equals(). You may want something like this:

while(!(current = br.readLine()).equals(previousLine) && current != null){

Keep in mind that breaking your While loop here will force your file reading to stop, which may or may not be what you intended.

MasterChef
  • 45
  • 1
  • 1
  • 9