Multiple lookup structures for same data: Memory duplication?

Question

Suppose I have data on a bunch of people, and I want to be able to look them up in different ways. Maybe there's some kind of data structure (like a binary tree) that facilitates lookup by name. And maybe there's another (like a list) that's by order of creation. And perhaps many more.

In many languages, you would have each person allocated exactly once on the heap. Each data structure would contain pointers to that memory. Thus, you're not allocating a new set of people every time you add a new way to look them up.

How about in Haskell? Is there any way to avoid memory duplication when different data structures need to index the same data?

The real question to me is: how would you manage to duplicate the record accidentally? If you use the same construction of the person's record then each mapping will point to the same memory. — Thomas M. DuBuisson, May 15 '13 at 21:48
Ah, well then that actually answers my question! I was wondering if this was the case. In other languages, like C, if you initialize a a struct twice with the same values, you're using twice the memory. But, if I'm understanding you correctly, it sounds like Haskell will deduplicate it automatically. — rlkw1024, May 16 '13 at 03:08
@ThomasM.DuBuisson: easily: say you've created all your data and lookup structure, then you want to normalize it. You do the equivalent of `(fmap . fmap) toUpper` over your multiple lookup tables, and now everything's duplicated. Sure, you could normalize first, but this seems like an easy mistake to make if you aren't careful. — John L, May 16 '13 at 03:19

AndrewC · Accepted Answer · 2013-05-15T22:49:21.293

I feel sure there's a deeper, more knowledgeable answer to this question, but for the time being...

Since in a pure functional programming language data is immutable, there's no need to do anything other than copy the pointer instead of copying its target.

As a quick and very dirty example, I fired up the ghci interpreter:

Prelude> let x = replicate 10000 'm' in all (==x) $ replicate 10000 x
True
(1.61 secs, 0 bytes)

I admit that these stats are unreliable, but what it's not doing is allocating memory for all 10000 copies of a list 10000 characters long.

Summary:

The way to avoid memory duplication is to
(a) use haskell
(b) avoid pointlessly reconstructing your data.

How can I pointlessly reconstruct my data?

A very simple and pointless example:

 pointlessly_reconstruct_list :: [a] -> [a]
 pointlessly_reconstruct_list [] = []
 pointlessly_reconstruct_list (x:xs) = x:xs

This kind of thing causes a duplicate of the list structure.

Have you got any examples that are a little less pointless but still simple?

Interestingly, if you do xs ++ ys you essentially reconstruct xs in order to place ys at the end of it (replacing []), so the list structure of xs is nearly copied wholesale. However, there's no need to replicate the actual data, and there certainly only needs to be one copy of ys.

If you do `xs ++ ys`, then the _spine_ of `xs` is duplicated, not the actual list elements themselves. — MathematicalOrchid, May 16 '13 at 07:51
Indeed. "There's no _need_ to replicate the actual data." - I should have said "There's no _need_ to replicate the actual data of `xs`, just the pointers." — AndrewC, May 16 '13 at 17:48

Multiple lookup structures for same data: Memory duplication?

1 Answers1

Summary:

Linked