Show duplicates in a String Array from csv File (Java)

Question

My problem is that I created an array from a csv file and I now have to output any values with duplicates. The file has a layout of 5x9952. It consists of the data:

id,birthday,name,sex, first name

I'd now like the program to show me in each column (e.g. name) which duplicates there are. Like if there are two people which the same name. But whatever I try from what I found on the Internet only shows me the duplicates of rows (like if name and first name are the same). Here's what I got so far:

package javacvs;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

/**
 *
 * @author Tobias
 */
public class main {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        String csvFile = "/Users/Tobias/Desktop/PatDaten/123.csv";
        String line = "";
        String cvsSplitBy = ",";

        try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {

            while ((line = br.readLine()) != null) {

                // use comma as separator
                String[] patDaten = line.split(cvsSplitBy);


for (int i = 0; i < patDaten.length-1; i++)
        {
            for (int j = i+1; j < patDaten.length; j++)
            {
                if( (patDaten[i].equals(patDaten[j])) && (i != j) )
                {
                    System.out.println("Duplicate Element is : "+patDaten[j]);
                }
            }
        }
                }
            }catch (IOException e) {
            e.printStackTrace();
        }
        }

    }

(I changed the name of the csv as it contains confidential data)

score 0 · Answer 1 · answered Sep 01 '17 at 07:42

You are iterating upon the rows instead of iterating upon the column. What you need to do is to have the same cycle but upon the column.

What you can do is to acumulate the names in a separate array and than iterate it. I am sure you know what index is the column you want to compare. So you will need one cycle extra to accumulate the column you want to check for duplications.

score 0 · Accepted Answer · answered Sep 01 '17 at 07:52

The real thing here: stop thinking "low level". Good OOP is about creating helpful abstractions.

In other words: your first stop should be to create a meaningful class definition that represents the content of one row, lets call it the Person class for now. And then you separate your further concerns:

you create one class/method that does nothing else but reading that CSV file - and creating one Person object per row
you create a meaningful data structure that tells you about duplicates

The later could (for example) some kind of reverse indexing. Meaning: you have a Map<String, List<Person>>. And after you have read all your Person objects (maybe in a simple list), you can do this:

Map<String, List<Person>> personsByName = new HashMap<>();
for (Person p : persons) {
  List<Person> personsForName = personsByName.get(p.getName());
  if (personsByName == null) {
    personsForName = new ArrayList<>();
    personsByName.put(p.getName(), personsForName);
  }
  personsForName.add(p);
}

After that loop that map contains all names used in your table - and for each name you have a list of the corresponding persons.

Viktor Mellgren · Answer 3 · 2017-09-01T08:11:19.520

0

It's a bit unclear what you want presented, the whole record, or only that there are duplicate names.

For the name only:

String csvFile = "test.csv";

List<String> readAllLines = Files.readAllLines(Paths.get(csvFile));

Set<String> names = new HashSet<>();

readAllLines.stream().map(s -> s.split(",")[2]).forEach(name -> {
    if (!names.add(name)) {
        System.out.println("Duplicate name: " + name);
    }
});

For the whole record:

String csvFile = "test.csv";

List<String> readAllLines = Files.readAllLines(Paths.get(csvFile));

Set<String> names = new HashSet<>();
readAllLines.stream().forEach(record -> {
    String name = record.split(",")[2];
    if (!names.add(name)) {
        System.out.println("Duplicate name: " + name + " with record " + record);
    }
});

edited Sep 01 '17 at 08:11

answered Sep 01 '17 at 08:05

Viktor Mellgren

4,318
3
42
75

Thanks, the second one is exactly what I was looking for. The only problem that emerged now is that the loop won't stop on it's own and I'm not sure how to fix it. – TobiasL Sep 01 '17 at 08:42
@TobiasL According to [this answer](https://stackoverflow.com/questions/23996454/terminate-or-break-java-8-stream-loop) `Stream.forEach` is not a loop and it's not designed for being terminated using something like `break`. So you have to use your loops. – IQV Sep 01 '17 at 08:48

score 0 · Answer 4 · answered Sep 01 '17 at 08:15

Your problem is the nesting of your loops. What you do is, that you read one line, split it up and then you compare the fields of this one row with each other. You do not even compare one line with other lines!

So first you need an array for all lines so you can compare these lines. As GhostCat recommended in his answer you should use your own class Person which has the five fields as attributes. But you could use a second array, so you can work with the indexes as Alexander Petrov said in his answer. In the latter case, you get a two-dimensional array:

String[][] patDaten;

After that you read all lines of your csv-file and for each line you create a new Person or a new inner array.

After reading the entire file, you compare the fields as you want. Here you use your double loop. So you compare patDaten[i].getName() with patDaten[j].getName() or with the array patDaten[i][1] with patDaten[j][1].

Show duplicates in a String Array from csv File (Java)

4 Answers4