I have a .csv
file that looks something like this:
123,1-1-2020,[Apple]
123,1-2-2020,[Apple]
123,1-2-2020,[Beer]
345,1-3-2020,[Bacon]
345,1-4-2020,[Cheese]
345,1-4-2020,[Sausage]
345,1-5-2020,[Bacon]
I made a function that finds if both the number
and date
of any line is similar, the items of the lines will appended with a number next to it to show how many items are there in the Set<String>
of items:
123,1-1-2020,1,[Apple]
123,1-2-2020,2,[Apple, Beer]
345,1-3-2020,1,[Bacon]
345,1-4-2020,2,[Cheese,Sausage]
345,1-5-2020,1,[Bacon]
While this is the intended result, the actual result with a larger set of data using my algorithm, items are sometimes randomly not counted and goes missing (the entire line disappears). The above example sometimes would becomes:
123,1-1-2020,1,[Apple]
123,1-2-2020,2,[Apple, Beer]
345,1-3-2020,1,[Bacon]
345,1-4-2020,2,[Cheese,Sausage]
345,1-5-2020,1,[Bacon]
// Any one of those output lines would sometimes disappear entirely.
I am very confused why is this happening. Below is the algorithm I implemented:
protected List<Receipt> convert(List<Receipt> list) {
List<Receipt> receipts = new ArrayList<>();
for (int i = 0; i < list.size(); i++) {
List<Receipt> temp = new ArrayList<>();
temp.add(list.get(i));
for (int j = i + 1; j < list.size(); j++) {
if (list.get(i).getNumber().equals(list.get(j).getNumber())) {
temp.add(list.get(j));
}
}
Map<String, Set<String>> map = new HashMap<>();
for (Receipt r : temp) {
if (!map.containsKey(r.getDate())) {
map.put(r.getDate(), r.getItems());
} else {
map.replace(r.getDate(), merge(map.get(r.getDate()), r.getItems()));
}
}
for (Map.Entry<String, Set<String>> m : map.entrySet()) {
receipts.add(new Receipt(list.get(i).getNumber(), m.getKey(), m.getValue()));
}
i = i + temp.size();
}
return receipts;
}
// Merge function used above to append two item sets.
private static Set<String> merge(Set<String> a, Set<String> b) {
return new HashSet<>() {
{
addAll(a);
addAll(b);
}
};
}
Where each line in the .csv
is made into a Receipt
object that has (String)
getNumber()
, (String)
getDate()
, and (Set<String>)
getItems()
method.
For a .csv with a similar format that has hundreds of thousands of lines, in the original data set, for example, the item Bacon
was found 1334 times but the output of my algorithm, Bacon
only appears randomly somewhere between 1160 - 1240 times. This happens to all other items as well. There are also some strange behaviors where a few random lines got the items appended wrong as well (same number, different date but still appended).
What could possibly be the cause of this randomness?
Edit : as suggestions in the comment section mentioned, the cause seems to be at i++
and i = i + temp.size()
incrementing the i
value incorrectly.