How is a CLR table valued function 'streaming''?

Question

The MSDN Docs on table-valued Sql Clr functions states:

Transact-SQL table-valued functions materialize the results of calling the function into an intermediate table. ... In contrast, CLR table-valued functions represent a streaming alternative. There is no requirement that the entire set of results be materialized in a single table. The IEnumerable object returned by the managed function is directly called by the execution plan of the query that calls the table-valued function, and the results are consumed in an incremental manner. ... It is also a better alternative if you have very large numbers of rows returned, because they do not have to be materialized in memory as a whole.

Then I find out that no data access is allowed in the 'Fill row' method. This means that you still have to do all of your data access in the init method and keep it in memory, waiting for 'Fill row' to be called. Have I misunderstood something? If I don't force my results into an array or list, I get an error: 'ExecuteReader requires an open and available Connection. The connection's current state is closed.'

Code sample:

[<SqlFunction(DataAccess = DataAccessKind.Read, FillRowMethodName = "Example8Row")>]
static member InitExample8() : System.Collections.IEnumerable = 
   let c = cn() // opens a context connection
   // I'd like to avoid forcing enumeration here:
   let data = getData c |> Array.ofSeq
   data :> System.Collections.IEnumerable

static member Example8Row ((obj : Object),(ssn: SqlChars byref)) = 
   do ssn <- new SqlChars(new SqlString(obj :?> string))
   ()

I'm dealing with several million rows here. Is there any way to do this lazily?

The way I read the documentation seems to imply that with a table-valued function the entire result set is generated and put somewhere - probably memory if it's small and tempdb otherwise - before its results are returned to the client. With CLR the result set can start being returned to the client directly from a memory buffer as soon as some records are available. I don't know if it's something you need to worry about explicitly. I think MSDN is just explaining the internal workings of both types of table function. Unless I've misunderstood the article. — Yuck, Apr 23 '11 at 11:44
I believe `yield return` works in C#. I would expect `seq { }` to work similarly. No? — Daniel, May 05 '11 at 18:44
@Daniel - That's what I was trying. I want to remove '|> Array.ofSeq' and use yield return instead but that leads me to this error. That is what this question is about. When yielding, data access is performed in the Example8Row function, which doesn't seem to be allowed. — Robert Jeppesen, May 05 '11 at 21:21
@Yuck and Robert: Yes, the "streaming" part refers to the ability to use `yield return` to send results back prior to the end of the method. This is an option that you could do, or you can build up the collection and send it all back at once at the end of the method. For T-SQL Multi-statement TVFs, there is no choice. All they can do is build up the results in a Table Variable which can only be returned at the end of the function. — Solomon Rutzky, Feb 12 '16 at 03:29

score 9 · Accepted Answer · edited Jan 17 '14 at 06:45

I'm assuming you're using SQL Server 2008. As mentioned by a Microsoft employee on this page, 2008 requires methods to be marked with DataAccessKind.Read much more frequently than 2005. One of those times is when the TVF participates in a transaction (which seemed to always be the case, when I tested). The solution is to specify enlist=false in the connection string, which, alas, cannot be combined with context connection=true. That means your connection string needs to be in typical client format: Data Source=.;Initial Catalog=MyDb;Integrated Security=sspi;Enlist=false and your assembly must be created with permission_set=external_access, at minimum. The following works:

using System;
using System.Collections;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

namespace SqlClrTest {
    public static class Test {
        [SqlFunction(
            DataAccess = DataAccessKind.Read,
            SystemDataAccess = SystemDataAccessKind.Read,
            TableDefinition = "RowNumber int",
            FillRowMethodName = "FillRow"
            )]
        public static IEnumerable MyTest(SqlInt32 databaseID) {
            using (var con = new SqlConnection("data source=.;initial catalog=TEST;integrated security=sspi;enlist=false")) {
                con.Open();
                using (var cmd = new SqlCommand("select top (100) RowNumber from SSP1 where DatabaseID = @DatabaseID", con)) {
                    cmd.Parameters.AddWithValue("@DatabaseID", databaseID.IsNull ? (object)DBNull.Value : databaseID.Value);
                    using (var reader = cmd.ExecuteReader()) {
                        while (reader.Read())
                            yield return reader.GetInt32(0);
                    }
                }
            }
        }
        public static void FillRow(object obj, out SqlInt32 rowNumber) {
            rowNumber = (int)obj;
        }
    }
}

Here's the same thing in F#:

namespace SqlClrTest

module Test =

    open System
    open System.Data
    open System.Data.SqlClient
    open System.Data.SqlTypes
    open Microsoft.SqlServer.Server

    [<SqlFunction(
        DataAccess = DataAccessKind.Read,
        SystemDataAccess = SystemDataAccessKind.Read,
        TableDefinition = "RowNumber int",
        FillRowMethodName = "FillRow"
        )>]
    let MyTest (databaseID:SqlInt32) =
        seq {
            use con = new SqlConnection("data source=.;initial catalog=TEST;integrated security=sspi;enlist=false")
            con.Open()
            use cmd = new SqlCommand("select top (100) RowNumber from SSP1 where DatabaseID = @DatabaseID", con)
            cmd.Parameters.AddWithValue("@DatabaseID", if databaseID.IsNull then box DBNull.Value else box databaseID.Value) |> ignore
            use reader = cmd.ExecuteReader()
            while reader.Read() do
                yield reader.GetInt32(0)
        } :> System.Collections.IEnumerable

    let FillRow (obj:obj) (rowNumber:SqlInt32 byref) =
        rowNumber <- SqlInt32(unbox obj)

The good news is: Microsoft considers this a bug.

Thanks Daniel, I had read somewhere about being able to do it with a connection that is not the context connection, but had considered it too messy. It is unfortunate the the Clr assembly needs to know how to connect to the database it lives in. :) — Robert Jeppesen, May 06 '11 at 06:52
As an aside, you don't need to compile with --standalone. Infact, this can cause problems if you use the same assembly from other assemblies, the types in FSharp.Core are essentially made copies of, and are no longer compatible with the version of FSharp.Core that regular assemblies uses. I just register FSharp.Core in Sql Server + whatever else I need. — Robert Jeppesen, May 06 '11 at 06:55
@Robert: Makes sense. I hadn't thought of that. I removed it from my answer. — Daniel, May 06 '11 at 18:12
The "Enlist=true" is very new for me. Now I can use yield return, great! Thanks! — TcKs, Jan 10 '12 at 20:42
@Daniel What's the effect of "enlist=false"? Will the TVF still be able to see modifications done thus far by the transaction? — Branko Dimitrijevic, Nov 25 '15 at 06:37
@BrankoDimitrijevic The effect of "Enlist=false;" is that the external connection is _not_ part of a distributed transaction. It is completely separate, just like all other sessions. So the only way it could see changes not yet committed is by doing dirty reads (`WITH (NOLOCK)` or `SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED`). — Solomon Rutzky, Feb 12 '16 at 03:25

score 1 · Answer 2 · answered Jan 22 '13 at 21:20

What you can do is to wrap an SqlDataReader class with an IEnumerable which uses an enumerator which, when its "Next" method is called, does MoveNext on the SqlDataReader and returns the SqlDataReader. Then, your FillRow method expects SqlDataReader as a class. If you have your enumerator close the database connection and the SqlDataReader when it can't "next" any more, then you've effectively streamed your output to the FillRows function. You can do this with a ContextConnection=true as well...

...the trouble here is that you have to be able to return the results of an actual query: if you're doing more complex things to create your result set, then you're out of luck.

do you have a code example? – tire0011 Feb 16 '23 at 13:43 — tire0011, Feb 16 '23 at 13:43

score 1 · Answer 3 · answered Apr 26 '11 at 21:49

Yes, you would need to pull the results into memory and then return from there. Although the intention would be that you avoid the need to do such operations.

You can see an example of the approach in one of the sections of the MSDN doc you linked to ("Sample: Returning the results of a SQL Query")

The examples are a bit contrived though as a real-world implementation of email validation would use a scalar rather than table function - returning a bool for each input email value rather than a list of those that are invalid.

Are you able to explain a bit more about what you're trying to achieve? There might be a better way of structuring the function.

-1 No need to pull everything into memory. The issue is more subtle than that. — Daniel, May 05 '11 at 22:45

How is a CLR table valued function 'streaming''?

3 Answers3

Linked