90

I'm looking for a good open source library for scala for math and statistics. Hopefully something like Apache Math or Colt, but implemented in Scala.

Can anyone point me in the right direction?

dave
  • 12,406
  • 10
  • 42
  • 59
  • 4
    It might help to explain why you're after a library implemented in Scala, rather than one that's merely usable from Scala. – retronym Jan 07 '12 at 22:06
  • Actually I started to use http://commons.apache.org/proper/commons-math/ and it is easy to use and works fine in Scala. – tom10271 Aug 01 '18 at 03:03

3 Answers3

149

Yes, there are some:

Scalalab

The ScalaLab project aims to provide an efficient scientific programming environment for the Java Virtual Machine. The scripting language is based on the Scala programming language enhanced with high level scientific operators and with an integrated environment that provides a Matlab-like working style.

The scripting code is extremely fast, close to Java (sometimes slower, sometimes faster), and usually faster from equivalent Matlab .m scripts!

Scalala is now superseded by Breeze

A high performance numeric linear algebra library for Scala, with rich Matlab-like operators on vectors and matrices; a library of numerical routines; support for plotting.

Factorie

FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.

Cassovary

by twitter for graph processing:

Cassovary is designed from the ground up to efficiently handle graphs with billions of edges. It comes with some common node and graph data structures and traversal algorithms. A typical usage is to do large-scale graph mining and analysis.

At Twitter, Cassovary forms the bottom layer of a stack that we use to power many of our graph-based features, including "Who to Follow" and “Similar to.” We also use it for relevance in Twitter Search and the algorithms that determine which Promoted Products users will see. Over time, we hope to bring more non-proprietary logic from some of those product features into Cassovary.

Algebird

Abstract algebra library from twitter:

Code is targeted at building aggregation systems (via Scalding or Storm). It was originally developed as part of Scalding's Matrix API, where Matrices had values which are elements of Monoids, Groups, or Rings. Subsequently, it was clear that the code had broader application within Scalding and on other projects within Twitter.

scala_prob

! has experimental status !

sb_probdsl offers simple discrete probabilistic programming support using scala's new delimited continuations support.

Malakov

A Markov Chain library for Scala

Markov chains represent stochastic processes where the probability distribution of the next step depends non-trivially on the current step, but does not depend on previous steps. Give this library some training data and it will generate new random data that statistically resembles it.

signal-collect

Signal/Collect is a programming model and framework for large-scale graph processing. The model is expressive enough to concisely formulate many iterated and data-flow algorithms on graphs, while allowing the framework to transparently parallelize the processing.

Grizzled.math

Includes stat and utility packages. Contains very basic and well known things, such as means std...

Probability Monad:

While it is not library it could help you a lot with dealing probabilities.

Community
  • 1
  • 1
om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
  • 4
    You can look at performance comparisons of Scalala and Scalalab vs Python [here](http://mwongstyle.com/wordpress/?p=60) – om-nom-nom Apr 15 '12 at 19:37
  • 15
    There is also [Saddle](http://saddle.github.com/): *Saddle is a data manipulation library for Scala that provides array-backed, indexed, one- and two-dimensional data structures that are judiciously specialized on JVM primitives to avoid the overhead of boxing and unboxing.* – om-nom-nom Apr 02 '13 at 20:04
  • 3
    om-nom-nom, you should raise Saddle to an answer. +1 – metasim Jun 19 '13 at 16:43
  • 1
    @SimeonFitch I'm waiting for a bit of free time, to take a closer look at saddle and perhaps write something more than above excerpt. – om-nom-nom Jun 20 '13 at 01:19
  • 1
    @om-nom-nom : Your link doesn't work anymore. – Pravesh Jain Sep 08 '14 at 05:41
9

Figaro is a Scala library for Probabilistic Programming. You could find more information about Figaro here Figaro Reference

Figaro is available for download from Figaro Github

The author of this library is currently writing a book on Probabilistic Programming using Figaro. Here is the link to the book page: Probabilistic Programming Book

Ravi
  • 3,223
  • 7
  • 37
  • 49
1

Spire

Spire is a numeric library for Scala which is intended to be generic, fast, and precise.

Using features such as specialization, macros, type classes, and implicits, Spire works hard to defy conventional wisdom around performance and precision trade-offs. A major goal is to allow developers to write efficient numeric code without having to "bake in" particular numeric representations. In most cases, generic implementations using Spire's specialized type classes perform identically to corresponding direct implementations.

Make42
  • 12,236
  • 24
  • 79
  • 155