1

I need to retrieve a large dataset from a MySQL server for processing. I am using the MySQL .net connector for this, as the application that is consuming the data is written in F#. The dataset is much too large to fit into memory, so I want avoid holding the raw data in memory altogether and operate on it as it streams into the application from the database server.

I've read that this can be done by using the ResultSet properties in the JDBC API a la Streaming large result sets with MySQL and http://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html, but I haven't found something similar for the .Net API.

So far my search has been primarily through the MySQL documentation at http://dev.mysql.com/doc/connector-net/en/connector-net-programming.html, but this hasn't turned up anything (at least, nothing obvious to me).

How might I accomplish what I'm looking to do?

Update:

Some details about my limitations.

I only have read access to the data source, as it is managed by a third party, so making any changes server-side is a no go.

Here's what the query I'm pushing to the server looks like:

SELECT
      SID.dwsi_store AS 'store'
     ,SID.dwsi_transaction_date AS 'transactionDate'
     ,SID.dwsi_transaction AS 'transaction'
     ,SID.dwsi_item AS 'sku'
     ,C.dwcl_class AS 'clss'
     ,D.dwde_department AS 'department'
 FROM
     dw_sls_item_dtl SID
 JOIN
     dw_item I
 JOIN
     dw_class C
 JOIN
     dw_department D
 ON
         SID.dwsi_store = I.dwin_store
     AND SID.dwsi_item = I.dwin_item_number
     AND I.dwin_store = C.dwcl_store
     AND I.dwin_class = C.dwcl_class
     AND I.dwin_store = D.dwde_store_number
     AND I.dwin_department = D.dwde_department
 WHERE
         SID.dwsi_transaction_date >= '2007-03-01'
     AND SID.dwsi_store BETWEEN '2' AND '8'
     AND NOT C.dwcl_class = ''
     AND NOT C.dwcl_class_name LIKE('%CCL''d%')
     AND D.dwde_department BETWEEN '10' AND '92'

While I could manually page through the data by parameterizing the where clause and retrieving the data one date at a time, I would prefer not to do this if the MySQL connector has a more elegant solution available.

Update

Here's the code that is calling the database server. It is using the ExecuteReader method which returns a DbDataReader. Now that I think of it, the problem is probably that I'm reading everything into a sequence without operating on it. It appears this issue is related to how I'm implementing the reader, not on the reader itself.

use cn = new MySqlConnection(cs)
use cmd = new MySqlCommand(sql,cn)
cmd.CommandType = CommandType.Text |> ignore
cn.Open()
use reader = cmd.ExecuteReader()
while reader.Read() do
    yield { store = unbox (reader.["store"])
            transactionDate = unbox (reader.["transactionDate"])
            transaction = unbox (reader.["transaction"])
            sku = unbox (reader.["sku"]) 
            clss = unbox(reader.["clss"])
            department = unbox(reader.["department"])} }
Community
  • 1
  • 1
Joe
  • 776
  • 7
  • 20
  • Maybe you should paginate your queries and get a result per parts instead load everything into memory. Can you do this? – Felipe Oriani Sep 08 '14 at 20:37
  • @FelipeOriani, that's a good idea, but I would prefer to not do that if there are any alternatives available. See update in my question. – Joe Sep 08 '14 at 20:47

1 Answers1

1

I think you are looking for a derived class of DbDataReader; likely OdbcDataReader or OleDbDataReader. These classes give forward-only access to a result set. See the links below for more information:

http://msdn.microsoft.com/en-us/library/haa3afyz(v=vs.110).aspx ("The DataReader is a good choice when retrieving large amounts of data because the data is not cached in memory.")

http://msdn.microsoft.com/en-us/library/system.data.common.dbdatareader(v=vs.110).aspx

http://msdn.microsoft.com/en-us/library/system.data.odbc.odbcdatareader(v=vs.110).aspx

http://msdn.microsoft.com/en-us/library/system.data.oledb.oledbdatareader(v=vs.110).aspx

phoog
  • 42,068
  • 6
  • 79
  • 117
  • I was using a dbdatareader, but it occurs to me that the issue is that I was pushing all of the results into an enumerable object rather than operating on them as they came in. The problem was on my end, not on the connection end the whole time. Thanks for your response though, it helped me see what I was doing wrong. – Joe Sep 08 '14 at 21:43