Solution #1: HashSet
A good solution to the immediate problem of reading a file into an ArrayList
with a uniqueness constraint is to simply keep a HashSet
of seen items. Before processing a line, we check that its key is not already in the set. If it isn't, we add the key to the set to mark it as finished, then add the line data to the result ArrayList
.
import java.util.*;
import java.io.*;
public class Main {
public static void main(String[] args)
throws FileNotFoundException, IOException {
String file = "prova.txt";
ArrayList<String[]> data = new ArrayList<>();
HashSet<String> seen = new HashSet<>();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
for (String line; (line = br.readLine()) != null;) {
String[] split = line.split("\\s+");
String key = split[0] + " " + split[1];
if (!seen.contains(key)) {
data.add(Arrays.copyOfRange(split, 2, split.length));
seen.add(key);
}
}
}
for (String[] row : data) {
System.out.println(Arrays.toString(row));
}
}
}
Solution #2: LinkedHashMap
/LinkedHashSet
Since we have key-value pairs in this particular dataset, we could roll everything into a LinkedHashMap<String, ArrayList<String>>
(see docs for LinkedHashMap
) which preserves ordering but can't be indexed into (use-case driven decision, but amounts to the same strategy as above. ArrayList<String>
or String[]
is arbitrary here--it could be any data value). Note that this version makes it easy to preserve the most recently seen key rather than the oldest (remove the !data.containsKey(key)
test).
import java.util.*;
import java.io.*;
public class Main {
public static void main(String[] args)
throws FileNotFoundException, IOException {
String file = "prova.txt";
LinkedHashMap<String, ArrayList<String>> data = new LinkedHashMap<>();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
for (String line; (line = br.readLine()) != null;) {
String[] split = line.split("\\s+");
String key = split[0] + " " + split[1];
if (!data.containsKey(key)) {
ArrayList<String> val = new ArrayList<>();
String[] sub = Arrays.copyOfRange(split, 2, split.length);
Collections.addAll(val, sub);
data.put(key, val);
}
}
}
for (Map.Entry<String, ArrayList<String>> e : data.entrySet()) {
System.out.println(e.getKey() + " => " + e.getValue());
}
}
}
Solution #3: ArrayListSet
The above examples represent pretty narrow use cases. Here's a sketch for a general ArrayListSet
class, which maintains the usual list behavior (add
/set
/remove
etc) while preserving uniqueness.
Basically, the class is an abstraction of solution #1 in this post (HashSet
combined with ArrayList
), but with a slightly different flavor (the data itself is used to determine uniqueness rather than a key, but it's a truer "ArrayList
" structure).
This class solves the problems of efficiency (ArrayList#contains
is linear, so we should reject that solution except in trivial cases), lack of ordering (storing everything directly in a HashSet
doesn't help us), lack of ArrayList
operations (LinkedHashSet
is otherwise the best solution but we can't index into it, so it's not a true replacement for an ArrayList
).
Using a HashMap<E, index>
instead of a HashSet
would speed up remove(Object o)
and indexOf(Object o)
functions (but slow down sort
). A linear remove(Object o)
is the main drawback over a plain HashSet
.
import java.util.*;
public class ArrayListSet<E> implements Iterable<E>, Set<E> {
private ArrayList<E> list;
private HashSet<E> set;
public ArrayListSet() {
list = new ArrayList<>();
set = new HashSet<>();
}
public boolean add(E e) {
return set.add(e) && list.add(e);
}
public boolean add(int i, E e) {
if (!set.add(e)) return false;
list.add(i, e);
return true;
}
public void clear() {
list.clear();
set.clear();
}
public boolean contains(Object o) {
return set.contains(o);
}
public E get(int i) {
return list.get(i);
}
public boolean isEmpty() {
return list.isEmpty();
}
public E remove(int i) {
E e = list.remove(i);
set.remove(e);
return e;
}
public boolean remove(Object o) {
if (set.remove(o)) {
list.remove(o);
return true;
}
return false;
}
public boolean set(int i, E e) {
if (set.contains(e)) return false;
set.add(e);
set.remove(list.set(i, e));
return true;
}
public int size() {
return list.size();
}
public void sort(Comparator<? super E> c) {
Collections.sort(list, c);
}
public Iterator<E> iterator() {
return list.iterator();
}
public boolean addAll(Collection<? extends E> c) {
int before = size();
for (E e : c) add(e);
return size() == before;
}
public boolean containsAll(Collection<?> c) {
return set.containsAll(c);
}
public boolean removeAll(Collection<?> c) {
return set.removeAll(c) && list.removeAll(c);
}
public boolean retainAll(Collection<?> c) {
return set.retainAll(c) && list.retainAll(c);
}
public Object[] toArray() {
return list.toArray();
}
public <T> T[] toArray(T[] a) {
return list.toArray(a);
}
}
Example usage:
public class ArrayListSetDriver {
public static void main(String[] args) {
ArrayListSet<String> fruit = new ArrayListSet<>();
fruit.add("apple");
fruit.add("banana");
fruit.add("kiwi");
fruit.add("strawberry");
fruit.add("apple");
fruit.add("strawberry");
for (String item : fruit) {
System.out.print(item + " "); // => apple banana kiwi strawberry
}
fruit.remove("kiwi");
fruit.remove(1);
fruit.add(0, "banana");
fruit.set(2, "cranberry");
fruit.set(0, "cranberry");
System.out.println();
for (int i = 0; i < fruit.size(); i++) {
System.out.print(fruit.get(i) + " "); // => banana apple cranberry
}
System.out.println();
}
}
Solution #4: ArrayListMap
This class solves a drawback of ArrayListSet
which is that the data we want to store and its associated key may not be the same. This class provides a put
method that enforces uniqueness on a different object than the data stored in the underlying ArrayList
. This is just what we need to solve the original problem posed in this thread. This gives us the ordering and iteration of an ArrayList
but fast lookups and uniqueness properties of a HashMap
. The HashMap
contains the unique values mapped to their index locations in the ArrayList
, which enforces ordering and provides iteration.
This approach solves the scalability problems of using a HashSet
in solution #1. That approach works fine for a quick file read, but without an abstraction, we'd have to handle all consistency operations by hand and pass around multiple raw data structures if we needed to enforce that contract across multiple functions and over time.
As with ArrayListSet
, this can be considered a proof of concept rather than a full implementation.
import java.util.*;
public class ArrayListMap<K, V> implements Iterable<V>, Map<K, V> {
private ArrayList<V> list;
private HashMap<K, Integer> map;
public ArrayListMap() {
list = new ArrayList<>();
map = new HashMap<>();
}
public void clear() {
list.clear();
map.clear();
}
public boolean containsKey(Object key) {
return map.containsKey(key);
}
public boolean containsValue(Object value) {
return list.contains(value);
}
public V get(int i) {
return list.get(i);
}
public boolean isEmpty() {
return map.isEmpty();
}
public V get(Object key) {
return list.get(map.get(key));
}
public V put(K key, V value) {
if (map.containsKey(key)) {
int i = map.get(key);
V v = list.get(i);
list.set(i, value);
return v;
}
list.add(value);
map.put(key, list.size() - 1);
return null;
}
public V putIfAbsent(K key, V value) {
if (map.containsKey(key)) {
if (list.get(map.get(key)) == null) {
list.set(map.get(key), value);
return null;
}
return list.get(map.get(key));
}
return put(key, value);
}
public V remove(int i) {
V v = list.remove(i);
for (Map.Entry<K, Integer> entry : map.entrySet()) {
if (entry.getValue() == i) {
map.remove(entry.getKey());
break;
}
}
decrementMapIndices(i);
return v;
}
public V remove(Object key) {
if (map.containsKey(key)) {
int i = map.remove(key);
V v = list.get(i);
list.remove(i);
decrementMapIndices(i);
return v;
}
return null;
}
private void decrementMapIndices(int start) {
for (Map.Entry<K, Integer> entry : map.entrySet()) {
int i = entry.getValue();
if (i > start) {
map.put(entry.getKey(), i - 1);
}
}
}
public int size() {
return list.size();
}
public void putAll(Map<? extends K, ? extends V> m) {
for (Map.Entry<? extends K, ? extends V> entry : m.entrySet()) {
put(entry.getKey(), entry.getValue());
}
}
public Set<Map.Entry<K, V>> entrySet() {
Set<Map.Entry<K, V>> es = new HashSet<>();
for (Map.Entry<K, Integer> entry : map.entrySet()) {
es.add(new AbstractMap.SimpleEntry<>(
entry.getKey(), list.get(entry.getValue())
));
}
return es;
}
public Set<K> keySet() {
return map.keySet();
}
public Collection<V> values() {
return list;
}
public Iterator<V> iterator() {
return list.iterator();
}
public Object[] toArray() {
return list.toArray();
}
public <T> T[] toArray(T[] a) {
return list.toArray(a);
}
}
Here's the class in action on the original problem:
import java.io.*;
public class Main {
public static void main(String[] args)
throws FileNotFoundException, IOException {
String file = "prova.txt";
ArrayListMap<String, String[]> data = new ArrayListMap<>();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
for (String line; (line = br.readLine()) != null;) {
String[] split = line.split("\\s+");
String key = split[0] + " " + split[1];
String[] sub = Arrays.copyOfRange(split, 2, split.length);
data.putIfAbsent(key, sub);
}
}
for (Map.Entry<String, String[]> e : data.entrySet()) {
System.out.println(e.getKey() + " => " +
java.util.Arrays.toString(e.getValue()));
}
for (String[] a : data) {
System.out.println(java.util.Arrays.toString(a));
}
}
}