0

I am currently working on a project which needs to remove duplicate sets of values from a CSV file using a Java method listed below:

CSVUtilsExample.java

package lacsp.portal.backing.oracle.webcenter.portalapp.pages;

import java.io.FileWriter;

import java.text.DecimalFormat;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Random;

public class CSVUtilsExample {

public static void main(String[] args) throws Exception {

    Random rand = new Random();
    int randomSets = rand.nextInt(100000) + 1;
    int val = 1;
    final DecimalFormat decimalFormat = new DecimalFormat("0000");
    String csvFile = "C:/work/tableOutput.csv";
    FileWriter writer = new FileWriter(csvFile);
    CSVUtils.writeLine(writer, Arrays.asList("SET_ID", "INT_VALUE"));
    // Will loop whilst val is less than the random sets generated
    while (val <= randomSets) {
        // Create an empty list
        List<Order> orders = new ArrayList<Order>();
        // Single set id for all items
        String setId = "S" + decimalFormat.format(val);
        // Create a bunch of orders between 10 and 500
        int numOrders = rand.nextInt(490)+10;
        for (int i = 0; i < numOrders; i++) {
            // Create a new Order and add it to the list
            orders.add(new Order(setId, rand.nextInt(1000) + 1));
        }
        for (Order o : orders) {
            List<String> list = new ArrayList<String>();
            list.add(o.getSET_ID());
            list.add(o.getINT_VALUE().toString());
            CSVUtils.writeLine(writer, list);
        }
        val++;
    }

    writer.flush();
    writer.close();

}


}

Order.Java

package lacsp.portal.backing.oracle.webcenter.portalapp.pages;

public class Order {

private String SET_ID;
private Integer INT_VALUE;

public Order(String SET_ID, Integer INT_VALUE) {
    this.SET_ID = SET_ID;
    this.INT_VALUE = INT_VALUE;

}

public void setSET_ID(String SET_ID) {
    this.SET_ID = SET_ID;
}

public String getSET_ID() {
    return SET_ID;
}

public void setINT_VALUE(Integer INT_VALUE) {
    this.INT_VALUE = INT_VALUE;
}

public Integer getINT_VALUE() {
    return INT_VALUE;
}
}

When I run the above it creates a .csv file with 100000 SET_ID records and a random number of INT_VALUE records, once this has created I would then like to create a method where any duplicates that have been created are removed or perhaps stripped into a separate file for example:

SET_ID, INT_VALUE
'S0001', 1
'S0001', 3
'S0001', 12
'S0001', 7
'S0001', 9

'S0002', 3
'S0002', 12
'S0002', 7

'S0003', 5
'S0003', 6
'S0003', 7
'S0003', 12
'S0003', 13

'S0004', 5
'S0004', 6
'S0004', 7
'S0004', 12
'S0004', 13

Should be reduced to

SET_ID, INT_VALUE
'S0001', 1
'S0001', 3
'S0001', 12
'S0001', 7
'S0001', 9

'S0003', 5
'S0003', 6
'S0003', 7
'S0003', 12
'S0003', 13

Please could anyone assist with this or perhaps have an idea on what the best approach would be?

  • Why should S0002 be removed? It is a subset, not a duplicate. – Absent Sep 27 '17 at 09:07
  • Hi Ivo S0002 needs to be removed because the INT_VALUES all already exist in the S0001 set, if for example S0002 also had the INT_VALUE 2 then it would not need to be removed, hope this makes sense – user3293437 Sep 27 '17 at 09:10

1 Answers1

0

The easiest way (in my opinion) to have unique SET_ID is:

1- store them inside a Set (Java SE API of Set "add" method), and ALSO

2- override the equals() and hashCode() methods in class Order inherited from java.lang.Object see how here.

Storing Order instances inside a Set guarantees that there won't be duplicates, and overriding equals() AND hashCode() defines how two instances of Order are compared in order to assess if they're equal or not.

The equals() method must be overridden because otherwise, in the default equals() method inherited from java.lang.Object, two instances are considered equal only if they are located in the same memory.

I would add to Order.java

@Override
public int hashCode() {
    int hash = 7;
    hash = 17 * hash + Objects.hashCode(this.SET_ID);
    return hash;
}

@Override
public boolean equals(Object obj) {
    if (this == obj) {
        return true;
    }
    if (obj == null) {
        return false;
    }
    if (getClass() != obj.getClass()) {
        return false;
    }
    final NewClass other = (NewClass) obj;
    if (!Objects.equals(this.SET_ID, other.SET_ID)) {
        return false;
    }
    return true;
}

and would change CSVUtilsExample.java so that instead of:

List<Order> orders = new ArrayList<Order>();

uses:

Set<Order> orders = new HashSet<Order>();
Guillem
  • 456
  • 4
  • 9
  • Hi Guillem thank you very much for the quick response however I am having some difficulty overiding the equals() and hashCode() guide, for the objects.hashcode it is asking me to import a package but am unsure which one and what else exactly I should be changing in the above to adapt to my code? – user3293437 Sep 27 '17 at 10:19
  • @3293427 Add "import java.util.Objects;" to the Order.java file, then the hashCode() and equals() in my reply should compile. I'd suggest you to use an IDE to write code, it's helpful suggesting this kind of things. – Guillem Sep 27 '17 at 13:13