I'm loading a 1GB ASCII text file with about 38 million rows into a HashSet. Using Java 11, the process takes about 8GB of memory.
HashSet<String> addresses = new HashSet<>(38741847);
try (Stream<String> lines = Files.lines(Paths.get("test.txt"), Charset.defaultCharset())) {
lines.forEach(addresses::add);
}
System.out.println(addresses.size());
Thread.sleep(100000);
Why is Java taking so much memory?
In comparison, I've implemented the same thing in Python, which takes only 4GB of memory.
s = set()
with open("test.txt") as file:
for line in file:
s.add(line)
print(len(s))
time.sleep(1000)