1

The Problem:

I have a web application where people can upload xml, xmls, csv files. I then take their content and insert it into my Oracle DB.


Technical details:

I recently had a problem where I get OutOfMemory Exception trying to use the data. The previous developer created a list of lists on the data in order to manage them. However, this is giving us OutOfMemory Exception. We are using the LinqToExcel library.

Sample code:

excel = new ExcelQueryFactory(excelFile);
IEnumerable<RowNoHeader> data = from row in excel.WorksheetNoHeader(sheetName)
                       select row;
List<List<string>> d = new List<List<string>>(data.Count());
foreach (RowNoHeader row in data)
{
    List<string> list = new List<string>();
    foreach (Cell cell in row)
    {
        string cellValue = cell.Value.ToString().Trim(' ').Trim(null);
        list.Add(cellValue);
    }

d.Add(list);
}

I have tried to change the code and instead did this:

string connectionstring = string.Format(@"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0;HDR=YES;';", excelFile);
OleDbConnection connection = new OleDbConnection();
connection.ConnectionString = connectionstring;
OleDbCommand excelCommand = new OleDbCommand();
excelCommand.Connection = connection;
excelCommand.CommandText = String.Format("Select * FROM [{0}$]", sheetName);
connection.Open();
DataTable dtbl = CreateTable(TableColumns);
OleDbDataReader reader = excelCommand.ExecuteReader();
while (reader.Read())
{
    DataRow row = dtbl.NewRow();

    dtbl.Rows.Add(row);
}

using (OracleCommand command = new OracleCommand(selectCommand, _oracleConnection))
{
    using (OracleDataAdapter adapter = new OracleDataAdapter(command))
    {
        using (OracleCommandBuilder builder = new OracleCommandBuilder(adapter))
        {
            OracleTransaction trans = _oracleConnection.BeginTransaction();
            command.Transaction = trans;
            adapter.InsertCommand = builder.GetInsertCommand(true);
            adapter.Update(dtbl);
            trans.Commit();
        }
    }
}

However, I still get the same OutOfMemory Exception. I have read online and've seen that I should make my project x64 and use the following:

<runtime>
    <gcAllowVeryLargeObjects enabled="true" />    
</runtime>

However, I can't change my web application to run on x64.

My solution was to make this in batches like this:

int rowCount = 0;
while (reader.Read())
{
    DataRow row = dtbl.NewRow();

    dtbl.Rows.Add(row);
    if (rowCount % _batches == 0 && rowCount != 0)
    {
        DBInsert(dtbl, selectCommand);
        dtbl = CreateTable(TableColumns);
    }
}

private void DBInsert(DataTable dt, string selectCommand)
{
        using (OracleCommand command = new OracleCommand(selectCommand, _oracleConnection))
        {
            using (OracleDataAdapter adapter = new OracleDataAdapter(command))
            {
                using (OracleCommandBuilder builder = new OracleCommandBuilder(adapter))
                {
                    OracleTransaction trans = _oracleConnection.BeginTransaction();
                    command.Transaction = trans;
                    adapter.InsertCommand = builder.GetInsertCommand(true);
                    adapter.Update(dt);
                    trans.Commit();
                }
            }
        }
    }
}

It works, however this is very slow. I was wondering if there is a way to either solve the problem with the memory serially or write in memory in parallel.

I have tried to insert the data in parallel using threads but this takes a lot of memory and throws OutOfMemory Exception as well.

Andreas Lymbouras
  • 1,025
  • 17
  • 26
  • . Just *don't* load 1M rows into a DataTable. Use whatever bulk import mechanism is available to load a stream of rows. Oracle, like SQL Server offers several ways to bulk import data – Panagiotis Kanavos Sep 27 '17 at 12:17
  • Hello Panagiotis, thanks for your comment. It uses commandOracle.ArrayBindCount. However, you still need to have arrays of values. – Andreas Lymbouras Sep 27 '17 at 12:31
  • Bulk import doesn't mean sending arrays or batching multiple commands. That's *still* an INSERT/UPDATE operation with all its problems. *Bulk* operations are meant to load a lot of data as fast as possible, using minimal logging, locking and keeping a minimal amount of rows idle in memory. – Panagiotis Kanavos Sep 27 '17 at 13:07

1 Answers1

1

Just don't load 1M rows into a DataTable. Use whatever bulk import mechanism is available to load a stream of rows. Oracle, like SQL Server offers several ways to bulk import data.

Collections like List or DataTable use an internal buffer to store data that they reallocate when it fills up, using twice the original size. With 1M rows that leads to a lot of reallocations and a lot of memory fragmentation. The runtime may no longer be able to even find a contiguous block of memory large enough to store 2M entries. That's why it's important to set the capacity parameter when creating a new List.

Apart from that, it doesn't serve any purpose to load everything in memory and then send it to the database. It's actually faster to send the data as soon as each file is read, or as soon as a sufficiently large number is loaded. Instead of trying to load 1M rows at once, read 500 or 1000 of them each time and send them to the database.

Furthermore, Oracle's ADO.NET provider includes the OracleBulkCopy class that works in a way similar to SqlBulkCopy for SQL Server. The WriteToServer method can accept a DataTable or a DataReader. You can use the DataTable overload to send batches of item. An even better idea is to use the overload that accepts a reader and have the class collect the batch and send it to the database.

Eg :

using(var bcp=OracleBulkCopy(connectionString))
{
    bcp.BatchSize=5000;
    bcp.DestinationTableName = "MyTable";

    //For each source/target column pair, add a mapping
        bcp.ColumnMappings.Add("ColumnA","ColumnA");

    var reader = excelCommand.ExecuteReader();

    bcp.WriteToServer(reader);
}
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • Thanks for the answer Panagiotis. Unfortunately, Oracle managedDataAccess doesn't support OracleBulkCopy. And dealing with Oracle.DataAccess.dll is a mess. https://stackoverflow.com/questions/659341/the-provider-is-not-compatible-with-the-version-of-oracle-client I would rather stick with my manual Batch insertion. Thanks for your help. – Andreas Lymbouras Sep 27 '17 at 14:34
  • @AndreasLympouras OracleBulkCopy is what makes bulk operations faster. You *still* have to avoid loading everything into memory, batch the rows and send them to the database in batches – Panagiotis Kanavos Sep 27 '17 at 14:48