Scala collection-like SQL support as in LINQ

Question

As far as I understand the only thing LINQ supports, which Scala currently doesn't with its collection library, is the integration with a SQL Database.

As far as I understand LINQ can "accumulate" various operations and can give "the whole" statement to the database when queried to process it there, preventing that a simple SELECT first copies the whole table into data structures of the VM.

If I'm wrong, I would be happy to be corrected.

If not, what is necessary to support the same in Scala?

Wouldn't it possible to write a library which implements the collection interface, but doesn't have any data structures backing it but a String which gets assembled with following collection into the required Database statement?

Or am I completely wrong with my observations?

Extremely good question. As a person interested in functional languages I am really curious what Scala provides in this respect and if/how you can build LINQ-style support as a library. Please upvote this question. — Stilgar, Dec 06 '10 at 12:17
I know that the Scala team thinks about integrating something like LINQ, but I guess it will take some time... — Landei, Dec 06 '10 at 15:22
UPDATE: These days you should look at Slick - https://github.com/slick — Jack, Jul 16 '12 at 09:54

score 13 · Accepted Answer · answered Dec 06 '10 at 21:41

As the author of ScalaQuery, I don't have much to add to Stilgar's explanation. The part of LINQ which is missing in Scala is indeed the expression trees. That is the reason why ScalaQuery performs all its computations on Column and Table types instead of the basic types of those entities.

You declare a table as a Table object with a projection (tuple) of its columns, e.g.:

class User extends Table[(Int, String)] {
  def id = column[Int]("id", O.PrimaryKey, O.AutoInc)
  def name = column[String]("name")
  def * = id ~ name
}

User.id and User.name are now of type Column[Int] and Column[String] respectively. All computations are performed in the Query monad (which is a more natural representation of database queries than the SQL statements that have to be created from it). Take the following query:

val q = for(u <- User if u.id < 5) yield u.name

After some implicit conversions and desugaring this translates to:

val q:Query[String] =
  Query[User.type](User).filter(u => u.id < ConstColumn[Int](5)).map(u => u.name)

The filter and map methods do not have to inspect their arguments as expression trees in order to build the query, they just run them. As you can see from the types, what looks superficially like "u.id:Int < 5:Int" is actually "u.id:Column[Int] < u.id:Column[Int]". Running this expression results in a query AST like Operator.Relational("<", NamedColumn("user", "id"), ConstColumn(5)). Similarly, the "filter" and "map" methods of the Query monad do not actually perform filtering and mapping but instead build up an AST that describes these operations.

The QueryBuilder then uses this AST to construct the actual SQL statement for the database (with a DBMS-specific syntax).

An alternative approach has been taken by ScalaQL which uses a compiler plugin to work directly with expression trees, ensure that they only contain the language subset which is allowed in database queries, and construct the queries statically.

Hi szeiger! Thanks for your answer. So basically ScalaQuery does what I thought of? The statement gets built inside via the various method calls and is given as a whole to the database. Is that correct? What would be necessary to not having to write the class declarations anymore? Some reflection magic could create the `def *` automatically, but If everything was removed, would it be necessary to have the database with the right setup at compile time so that th ecompiler can inference the types correctly? — soc, Dec 06 '10 at 22:19
That's correct. You can go one step further and define a QueryTemplate which is compiled only once to a SQL string and sent to the DBMS as a PreparedStatement to allow efficient reuse when it is called repeatedly with different parameters. Reflection magic could certainly be used to create tables from (case) classes and build queries but the query DSL would have to be untyped. If you want static typing, the compiler needs the Table objects (could be generated automatically during the build process). Or you compile real Collection API calls to queries with a compiler plugin like ScalaQL does. — szeiger, Dec 07 '10 at 18:32

score 13 · Answer 2 · edited Apr 20 '11 at 15:39

I should mention that Scala does have experimental support for expression trees. If you pass an anonymous function as an argument to a method expecting a parameter of type scala.reflect.Code[A], you get an AST.

scala> import scala.reflect.Code      
import scala.reflect.Code 
scala> def codeOf[A](code: Code[A]) = code
codeOf: [A](code:scala.reflect.Code[A])scala.reflect.Code[A]
scala> codeOf((x: Int) => x * x).tree 
res8: scala.reflect.Tree=Function(List(LocalValue(NoSymbol,x,PrefixedType(ThisType(Class(scala)),Class(scala.Int)))),Apply(Select(Ident(LocalValue(NoSymbol,x,PrefixedType(ThisType(Class(scala)),Class(scala.Int)))),Method(scala.Int.$times,MethodType(List(LocalValue(NoSymbol,x$1,PrefixedType(ThisType(Class(scala)),Class(scala.Int)))),PrefixedType(ThisType(Class(scala)),Class(scala.Int))))),List(Ident(LocalValue(NoSymbol,x,PrefixedType(ThisType(Class(scala)),Class(scala.Int)))))))

This has been used in the bytecode generation library 'Mnemonics', which was presented by its author Johannes Rudolph at Scala Days 2010.

Stilgar · Answer 3 · 2010-12-06T14:10:13.317

With LINQ the compiler checks to see if the lambda expression is compiled to IEnumerable or to IQueryable. The first works like Scala collections. The second compiles the expression to an expression tree (i.e. data structure). The power of LINQ is that the compiler itself can translate the lambdas to expression trees. You can write a library that builds expression trees with interface similar to what you have for collection but how are you goning to make the compiler build data structures (instead of JVM code) from lambdas?

That being said I am not sure what Scala provides in this respect. Maybe it is possible to build data structures out of lambdas in Scala but in any case I believe you need a similar feature in the compiler to build support for databases. Mind you that databases are not the only underlying data source that you can build providers for. There are numerous LINQ providers to stuff like Active Directory or the Ebay API for example.

Edit: Why there cannot be just an API?

In order to make queries you do not only use the API methods (filter, Where, etc...) but you also use lambda expressions as arguments of these methods .Where(x => x > 3) (C# LINQ). The compilers translate the lambdas to bytecode. The API needs to build data structures (expression trees) so that you can translate the data structure to the underlying data source. Basically you need the compiler to do this for you.

Disclaimer 1: Maybe (just maybe) there is some way to create proxy objects that execute the lambdas but overload the operators to produce data structures. This would result in slightly worse performance than the actual LINQ (runtime vs compile time). I am not sure if such a library is possible. Maybe the ScalaQuery library uses similar approach.

Disclaimer 2: Maybe the Scala language actually can provide the lambdas as an inspectable objects so that you can retrieve the expression tree. This would make the lambda feature in Scala equivalent to the one in C#. Maybe the ScalaQuery library uses this hypothetical feature.

Edit 2: I did a bit of digging. It seems like ScalaQuery uses the library approach and overloads a bunch of operators to produce the trees at runtime. I am not entirely sure about the details because I am not familiar with Scala terminology and have hard time reading the complex Scala code in the article: http://szeiger.de/blog/2008/12/21/a-type-safe-database-query-dsl-for-scala/

Like every object which can be used in or returned from a query, a table is parameterized with the type of the values it represents. This is always a tuple of individual column types, in our case Integers and Strings (note the use of java.lang.Integer instead of Int; more about that later). In this respect SQuery (as I’ve named it for now) is closer to HaskellDB than to LINQ because Scala (like most languages) does not give you access to an expression’s AST at runtime. In LINQ you can write queries using the real types of the values and columns in your database and have the query expression’s AST translated to SQL at runtime. Without this option we have to use meta-objects like Table and Column to build our own AST from these.

Very cool library indeed. I hope in the future it gets the love it deserves and becomes real production ready tool.

Wouldn't it be enough to have a API which just assembles the query string and uses that to query the database? — soc, Dec 06 '10 at 12:28
@Stilgar: Didn't Microsoft even deprecate the LINQ-to-SQL stuff some time ago? I remember that they ripped out the possibility to support other database vendors before their first release to only allow MS SQL ... — soc, Dec 06 '10 at 12:33
LINQ to SQL is not deprecated but is not the main focus of development anymore though they still add optimizations and little features. Entity Framework is. My personal opinion is that LINQ to SQL was just a production ready proof of concept. It is so cool proof of concept that many people love it for its simplicity. However in the context of this question the provider does not really matter be it LINQ to SQL, EF, Active Directory, etc. What is important is that there is an expression tree. You can write code to translate this tree in whatever you like. BTW StackOverflow runs ot LINQ to SQL. — Stilgar, Dec 06 '10 at 12:38
"[...] you do not only use the API methods (filter, Where, etc...) but you also use lambda expressions as arguments of these methods." Mhh, I guess I miss the point here ... — soc, Dec 06 '10 at 14:30
You can RUN the lambda you get as an argument but you cannot inspect its CODE with your own code so that you could translate that lambda code to SQL. This is why it cannot be done in Scala the same way it is done in C#. As it seems you can wrap the lambdas in a monad of some kind and query the proxy objects that can then produce the SQL. I do not entirely understand how it is done but it is obviously possible. Note that the table classes use wrappers (Column) instead of Integer for the fields. These classes somehow convert the lambdas to SQL code. — Stilgar, Dec 06 '10 at 15:12
There's no operator overloading in ScalaQuery, actually. It's just that `x` inside a for-comprehension would not stand for an `Int`, but for a `Column`. That makes it possible to provide the operators (which are actually in another class, offered through implicit conversion) that return a condition, so to speak, instead of a `Boolean`. While this is done at run-time, the resulting object is a `Query`, which need only be create once and may be re-used. — Daniel C. Sobral, Dec 06 '10 at 20:18
I see so this is not operator overloading but simply defining a new operator on the new type. — Stilgar, Dec 06 '10 at 22:27

score 4 · Answer 4 · answered Dec 06 '10 at 12:10

4

You probably want something like http://scalaquery.org/. It does exactly what @Stilgar answer suggests, except it's only SQL.

answered Dec 06 '10 at 12:10

pedrofurla

12,763
1
38
49

Very interesting. How do they translate the lambdas to expression trees? – Stilgar Dec 06 '10 at 12:16
I know scalaquery, but I'm wondering if it is possible to integrate the query into the collection API without having a completely different API like the database query APIs we have. – soc Dec 06 '10 at 12:30
Additionally in Scalaquery you have to describe your DB tables in the code again, afaik this is not necessary in LINQ. – soc Dec 06 '10 at 12:37
This is not necessary in LINQ to SQL only because it has a code generator that does it for you. It is possible to write such a code generator for ScalaQuery and if ScalaQuery becomes popular someone will. Theoretically you can add some level of dynamism to your queries and avoid even the code generation but then you will lose some type safety. BTW there is very cool research by the F# team on something called Type Providers intended to replace the code generation in this and similar cases but it is entirely different discussion. – Stilgar Dec 06 '10 at 12:43
@soc Scalaquery uses `flatMap`, `map` and `foreach`, just like collections, so it should have some integration at the for-comprehension level. What kind of integration do you have in mind? – Daniel C. Sobral Dec 06 '10 at 19:54
@Stilgar It doesn't. ScalaQuery does use expression trees to do what it does. Monads suffice for it. You can think of it like a collection's `view`: it accumulates the operations, and process them when finally applied for. – Daniel C. Sobral Dec 06 '10 at 19:57

score 1 · Answer 5 · answered Dec 06 '10 at 21:48

1

check out http://squeryl.org/

answered Dec 06 '10 at 21:48

moi_meme

9,180
4
44
63

Scala collection-like SQL support as in LINQ

5 Answers5