1

So, I have to import CSV file (up to 500mb of size) in ASP.NET Core 2.1 to MS sql database.

For importing I am using Sql Bulk insert, everything it's okay, but some of uploads take about 20 minutes. Even though those requests with import will be rare. But still thinking how to make it more better using CallBack, so when I start import I return response and in the background thread will be active task who will willing to get success information of finished import.

What do you think if it's worth or maybe there is a better solution?

    public async Task AddBulk<T>(IDataReader data) where T : class, new()
    {
        if (Connection.State == ConnectionState.Broken || Connection.State == ConnectionState.Closed)
        {
            await Connection.OpenAsync();
        }

        using (SqlBulkCopy bulk = new SqlBulkCopy(Connection))
        {
            bulk.DestinationTableName = "Test_Data_Table";
            bulk.BatchSize = 5000;
            bulk.EnableStreaming = true;


            bulk.WriteToServer(data);
            data.Close();
        }
    }
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
Andrius
  • 344
  • 4
  • 16
  • please elaborate and show us little bit of code you tried – Pribina May 16 '19 at 09:32
  • First of all, why is a *web app* involved at all? Just use `BULK INSERT` or `bcp` to load the data. Assuming you wanted to create a web interface for that job, the native tools are *still* the best option. – Panagiotis Kanavos May 16 '19 at 09:32
  • have a look at [How to upload large files using MVC 4?](https://stackoverflow.com/a/41815990/2417602) – vikscool May 16 '19 at 09:33
  • What do you mean `CallBack`? What you'd need is a way to execute the import job in the background and notify the user. There are no callbacks involved in this. The job itself could run as [a background job in a hosted service](https://learn.microsoft.com/en-us/aspnet/core/fundamentals/host/hosted-services?view=aspnetcore-2.2). The web UI can poll for the status of that job, or you could use SignalR/WebSockets to send notifications to the web page – Panagiotis Kanavos May 16 '19 at 09:33
  • Maybe you can use the following pattern [long tunning operation with polling](http://restalk-patterns.org/long-running-operation-polling.html) – ganchito55 May 16 '19 at 09:34
  • 1
    If you *really* need to import the data from C#, you could use a library like CsvHelper to read the CSV as a stream using [CsvDataReader](https://joshclose.github.io/CsvHelper/api/CsvHelper/CsvDataReader/) and pass it to SqlBulkCopy's `WriteToServer`. – Panagiotis Kanavos May 16 '19 at 09:37
  • How much time does it take to import the data using bcp, BULK INSERT or SSIS? 20 minutes sounds a bit too much. Are there any indexes on the table? – Panagiotis Kanavos May 16 '19 at 09:39
  • Another option could be to put the import operation in a SQL Server Agent job. Save the file in a drop folder on the server, start the job and poll its status, either from a periodically refreshing page or a background task timer that could eg use SignalR to talk to the browser. Nothing beats bulk imports from local storage for performance. – Panagiotis Kanavos May 16 '19 at 09:42
  • I added the code above, 20minutes takes to import with Bulk insert, but that file contains about 11millions of records. Users will use my user interface so they could upload file to my web service, so I must do with a data access library. But the idea that request can take up to 20 minutes scares me a lot and I am trying to find how to not overloaded my web service – Andrius May 16 '19 at 09:43
  • @Panagiotis Kanavos well I didn't yet put indexes, because they would be useless in database later. I will tell you a bit more: the data in Ms Sql later will be synchronized with tabular model, I can't directly push CSV file to tabular model so in the middle I am using MS sql as data source to tabular model – Andrius May 16 '19 at 09:46
  • @Andrius SSAS has nothing to do with this question and indexes *do* help during ETL. A lot. They hamper loading the raw data though, which is why they're often disabled before loading. The **important** information is how long it takes to load the table using `bcp` or `bulk insert` though. – Panagiotis Kanavos May 16 '19 at 09:48
  • Where is your sql-server running? Does your app and sql server run on the same machine or in Azure? When in Azure how much DTU's do you have? – user743414 May 16 '19 at 09:48
  • So the flow is: imprt data to ms sql, trigger azure machine learning R script (R scripts calculates aggragate columns and updates it to ms sql database) and later I updated tabular model, so updated data should appear on azure analysis service. Maybe I should create another topic for this data flow? – Andrius May 16 '19 at 09:50
  • @Andrius, no, you should explain where the database runs and if you've used `bcp` to measure how long it should take. Using R scripts to calculate aggregates by the way is way, way, way slower than doing the same job with SQL, and meaningless anyway if you intend to use SSAS Tabular. It's *tabular'* job to perform aggregations. In fact, if you use columnstore indexes you may not even need Tabular – Panagiotis Kanavos May 16 '19 at 09:52
  • @ user743414 at this moment it's cheapest database with 10 DTU. I am doing request from my local web service in my computer to azure database in the cloud, later database and web service will be on the same server – Andrius May 16 '19 at 09:52
  • 1
    @Andrius so it's not just SQL Server, it's *SQL Azure*. That's *very* important information that should be *crystal clear* in the title and tags – Panagiotis Kanavos May 16 '19 at 09:53
  • @Panagiotis Kanavos I will update with more detailed information, thank you – Andrius May 16 '19 at 09:55
  • 1
    @Andrius Azure SQL has its own bulk import mechanisms. [Check this article](https://social.technet.microsoft.com/wiki/contents/articles/52061.t-sql-bulk-insert-azure-csv-blob-into-azure-sql-database.aspx) for example. You can store the CSV in Blob storage, add it as an external data source and `BULK INSERT` from it. – Panagiotis Kanavos May 16 '19 at 09:55
  • @Panagiotis Kanavos Yees, I was thinking about Blob storage as well, but how then I will put those in my tabular model, from azure machine learning I can't. Maybe I should refuse Azure Machine learning, but for calculation of aggragate columns I must use R scripts... – Andrius May 16 '19 at 09:58
  • @Andrius this question is about bulk imports, not the *rest* of the flow. You should post a different question about the rest, because I suspect there are a lot of misconceptions. Data warehouses in SQL Server have been using SSAS, OLAP or Tabular, to calculate aggregations for over 20 years without using R. R is useful for machine learning and prediction, not aggregate calculation – Panagiotis Kanavos May 16 '19 at 10:02
  • @Panagiotis Kanavos you are right, I will create a new question I will put the link in comments soon – Andrius May 16 '19 at 10:12
  • In general the performance with a local running app who bulk insert data into a azure sql-server will be much slower than running both in azure, because your data first has to be uploaded. So you should do a test with your app running in azure too. We've had similar questions a few years ago, using this: https://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader library helped a lot reading 1 million records took about ~30 sec. on a dev machine, writing them took a few min. In azure (read/bulk insert) that took ~4 minutes with 10k bulk rows and usage of Z.EntityFramework and read from a BLOB. – user743414 May 16 '19 at 10:56
  • @Panagiotis Kanavos https://stackoverflow.com/questions/56166966/calculating-helping-columns-on-r-script-on-azure-machine-learning-so-they-could – Andrius May 16 '19 at 10:59
  • @user743414 so when sql database and web service are on the same server, there is no upload time, correct? – Andrius May 16 '19 at 11:03
  • The upload still happes, but before the processing starts, because the file is already at a location your app can access it. In your test the whole runtime contains the time to upload the data (send it to the sql server). In general your app and azure sql server are running inside the same data center and they've a much faster network that you'll have. – user743414 May 16 '19 at 11:08

0 Answers0