4

I'm writing a program that's going to generate a bunch of data. I'd like to find various percentiles over that data.

The obvious way to do this is to store the data in some kind of sorted container. Are there any Haskell libraries which offer a container which is automatically sorted and offers fast random access to arbitrary indexes?

The alternative is to use an unordered container and perform sorting at the end. I don't know if that's going to be any faster. Either way, we're still left with needing a container which offers fast random access. (An array, perhaps...)

Suggestions?

(Another alternative is to build a histogram, rather than keep the entire data set in memory. But since the objective is to compute percentiles extremely accurately, I'm reluctant to go down that route. I also don't know the range of my data until I generate it...)

MathematicalOrchid
  • 61,854
  • 19
  • 123
  • 220
  • 2
    Do streaming algorithms such as those described at http://stackoverflow.com/questions/1248815/percentiles-of-live-data-capture meet your needs? – Jeff Foster Jun 23 '13 at 11:06
  • @JeffFoster This seems relevant to what I'm trying to do. I'm not sure whether this approach will work, but it's worth investigating. – MathematicalOrchid Jun 23 '13 at 13:32

1 Answers1

5

Are there any Haskell libraries which offer a container which is automatically sorted and offers fast random access to arbitrary indexes?

Yes, it's your good old Data.Map. See elemAt and other functions under the «Indexed» category.

Data.Set doesn't offer these, but you can emulate it with Data.Map YourType ().

Roman Cheplyaka
  • 37,738
  • 7
  • 72
  • 121
  • 1
    @MathematicalOrchid: It's simple to augment a search tree to support a `select` operation. Just store the subtree sizes in every node :) So no wonder this was implemented in `Map` – Niklas B. Jun 23 '13 at 14:58