I am currently working on a project which needs to remove duplicate sets of values from a CSV file using a Java method listed below:
CSVUtilsExample.java
package lacsp.portal.backing.oracle.webcenter.portalapp.pages;
import java.io.FileWriter;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Random;
public class CSVUtilsExample {
public static void main(String[] args) throws Exception {
Random rand = new Random();
int randomSets = rand.nextInt(100000) + 1;
int val = 1;
final DecimalFormat decimalFormat = new DecimalFormat("0000");
String csvFile = "C:/work/tableOutput.csv";
FileWriter writer = new FileWriter(csvFile);
CSVUtils.writeLine(writer, Arrays.asList("SET_ID", "INT_VALUE"));
// Will loop whilst val is less than the random sets generated
while (val <= randomSets) {
// Create an empty list
List<Order> orders = new ArrayList<Order>();
// Single set id for all items
String setId = "S" + decimalFormat.format(val);
// Create a bunch of orders between 10 and 500
int numOrders = rand.nextInt(490)+10;
for (int i = 0; i < numOrders; i++) {
// Create a new Order and add it to the list
orders.add(new Order(setId, rand.nextInt(1000) + 1));
}
for (Order o : orders) {
List<String> list = new ArrayList<String>();
list.add(o.getSET_ID());
list.add(o.getINT_VALUE().toString());
CSVUtils.writeLine(writer, list);
}
val++;
}
writer.flush();
writer.close();
}
}
Order.Java
package lacsp.portal.backing.oracle.webcenter.portalapp.pages;
public class Order {
private String SET_ID;
private Integer INT_VALUE;
public Order(String SET_ID, Integer INT_VALUE) {
this.SET_ID = SET_ID;
this.INT_VALUE = INT_VALUE;
}
public void setSET_ID(String SET_ID) {
this.SET_ID = SET_ID;
}
public String getSET_ID() {
return SET_ID;
}
public void setINT_VALUE(Integer INT_VALUE) {
this.INT_VALUE = INT_VALUE;
}
public Integer getINT_VALUE() {
return INT_VALUE;
}
}
When I run the above it creates a .csv file with 100000 SET_ID records and a random number of INT_VALUE records, once this has created I would then like to create a method where any duplicates that have been created are removed or perhaps stripped into a separate file for example:
SET_ID, INT_VALUE
'S0001', 1
'S0001', 3
'S0001', 12
'S0001', 7
'S0001', 9
'S0002', 3
'S0002', 12
'S0002', 7
'S0003', 5
'S0003', 6
'S0003', 7
'S0003', 12
'S0003', 13
'S0004', 5
'S0004', 6
'S0004', 7
'S0004', 12
'S0004', 13
Should be reduced to
SET_ID, INT_VALUE
'S0001', 1
'S0001', 3
'S0001', 12
'S0001', 7
'S0001', 9
'S0003', 5
'S0003', 6
'S0003', 7
'S0003', 12
'S0003', 13
Please could anyone assist with this or perhaps have an idea on what the best approach would be?