2

I have a huge (26GB) sqlite database that I want to import to SQL Server with SSIS.

I have everything setup correctly. Some of the data flows are working correctly and importing the data.

Data flows are simple. They just consist of source and destination.

But when it comes to a table that has 80 million rows, data flow fails with this unhelpful message:

Code: 0xC0047062
Source: Data Flow Task Source 9 - nibrs_bias_motivation [55]
Description: System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] unknown error (7)

at System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode)
at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader, Object[] methodArguments, SQL_API odbcApiMethod)
at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader)
at System.Data.Odbc.OdbcCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.Odbc.OdbcCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
at Microsoft.SqlServer.Dts.Pipeline.DataReaderSourceAdapter.PreExecute()
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)

And before this task fails, memory usage goes up to 99%, then the task fails. This made me think its a memory issue. But I don't know how can I solve this.

I tried setting DelayValidation to true on all data flow tasks. Nothing changed. I played with the buffer sizes. Nothing.

What can I do?

Hadi
  • 36,233
  • 13
  • 65
  • 124
Can Vural
  • 2,222
  • 1
  • 28
  • 43
  • 4
    I would add a FOR LOOP container and load data in chunks i.e. 1M records per single batch. – Lukasz Szozda Jan 13 '19 at 12:03
  • 1
    Please what is your MSSql version ? – Sanpas Jan 13 '19 at 13:38
  • @pascalsanchez SQL Server 2017 (14.0.1000.169) – Can Vural Jan 13 '19 at 14:50
  • 1
    @CanVural it’s express edition ? – Sanpas Jan 13 '19 at 15:00
  • 1
    Copy the questionable Data Flow Task from the current package (I presume there are many in there) into a new package and then run the SSIS package without debugging (Ctrl-F5 is the default keystroke in VS/SSDT) Also, I assume you are not loading to an instance of SQL Server on your current machine - if so, you need to specify the maximum server memory – billinkc Jan 13 '19 at 16:47
  • @pascalsanchez No it's not. – Can Vural Jan 13 '19 at 17:15
  • @billing I will try that. And no. Sal server is on the same machine. Everything on my laptop. – Can Vural Jan 13 '19 at 17:16
  • 1
    The data flow fails before loading the data or after start reading rows from the source? Also what type of source you are using? What is the Max commit size specified in OLEDB destination? – Hadi Jan 13 '19 at 17:38
  • 1
    @hadi Data flow fails during preexecute phase. So i don't know if it starts reading rows. Source is odbc source. Cause there is only odbc driver available for sqlite as far as I know. How can I check the max commit size? – Can Vural Jan 13 '19 at 17:50
  • 1
    @CanVural if its failing during pre-execute, it is not related to max commit – Hadi Jan 13 '19 at 17:51
  • 1
    @CanVural try reading your data in chunks as suggested in the first comment – Hadi Jan 13 '19 at 17:53
  • 1
    @CanVural if you need more details i will try to give you a detailed answer – Hadi Jan 13 '19 at 18:10
  • 1
    @Hadi Ok, I'll try to read the data in chucks and insert. Thank you for your help. I'll write later when I'm finished or have some problems. – Can Vural Jan 13 '19 at 18:20
  • 1
    @hadi Ok little bit help needed. How should it look like in general? I thought maybe have some variables, use for loop with max row size is the end condition. Then fetch data with LIMIT and OFFSET. But I don't think this is the answer – Can Vural Jan 13 '19 at 19:15
  • @CanVural few minutes and i will provide a detailed answer – Hadi Jan 13 '19 at 19:36
  • I provided more details on how to read data by chunk, check it out. Hope that it will helps – Hadi Jan 13 '19 at 20:11

1 Answers1

3

Step by Step guide

Since the error is thrown when reading from a large dataset, try reading data by chunks, to achieve that you can follow these steps:

  1. Declare 2 Variables of type Int32 (@[User::RowCount] and @[User::IncrementValue])
  2. Add an Execute SQL Task that execute a select Count(*) command and store the Result Set into the variable @[User::RowCount]

enter image description here

enter image description here

  1. Add a For Loop with the following preferences:

enter image description here

  1. Inside the for loop container add a Data flow task
  2. Inside the dataflow task add an ODBC Source and OLEDB Destination
  3. In the ODBC Source select SQL Command option and write a SELECT * FROM TABLE query *(to retrieve metadata only`
  4. Map the columns between source and destination
  5. Go back to the Control flow and click on the Data flow task and hit F4 to view the properties window
  6. In the properties window go to expression and Assign the following expression to [ODBC Source].[SQLCommand] property: (for more info refer to How to pass SSIS variables in ODBC SQLCommand expression?)

    "SELECT * FROM MYTABLE ORDER BY ID_COLUMN
    LIMIT 500000
    OFFSET " + (DT_WSTR,50)@[User::IncrementValue]"
    

Where MYTABLE is the source table name, and IDCOLUMN is your primary key or identity column.

Control Flow Screenshot

enter image description here

References

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • 1
    Thank you for the detailed answer! So I was on the right track also, but your answer helped too. One thing to note, when I selected ODBC source in the for loop data flow, `[ODBC Source].[SQLCommand]` property was not available in the expressions. So I changed it to `ADO.NET` and the property was there. I don't know why. – Can Vural Jan 14 '19 at 10:13