2

As you probably know, inserting data into a table the "Entity Framework"-way is incredibly slow when you try to insert a lot of rows. One other way is to use SqlBulkCopy which does a great job of increasing performance. The problem is that SqlBulkCopy (from what I've read and tested) doesn't support the SQL geometry type in .NET Core.

This is in EF Core and the C# property type is an NTS Geometry which cannot be changed.

Using the old library Microsoft.SqlServer.Types is not an option because they don't work in .NET Core. The data is currently loaded as NTS Geometry from another SQL Server database.

Has anyone found an efficient way to insert many rows?

Johan
  • 1,260
  • 16
  • 34
  • I'd recommend [linq2db](https://linq2db.github.io/articles/sql/Bulk-Copy.html) – Alexander Petrov Jul 14 '21 at 12:18
  • See [EF Core Tools & Extensions](https://learn.microsoft.com/en-us/ef/core/extensions/) - search for term "bulk". There are several extensions that can do this. – Alexander Petrov Jul 14 '21 at 12:19
  • But do they support geometry? – Caius Jard Jul 14 '21 at 12:20
  • SqlBulkCopy *does* support Geometry. The field/column/property's type should be `SqlGeometry`. Instead of loading all data into a DataTable, essentially creating a duplicate, you can use FastMember's `ObjectReader` to generate an `IDataReader` wrapper over any `IEnumerable< T>` – Panagiotis Kanavos Jul 14 '21 at 12:59
  • Using `SqlGeometry.STGeomFromText` is enough to generate a SqlGeometry from a WKT string. Where does the source data come from? If it can be exposed as an `IEnumerable`, a simple `Select` would be enough to add a `SqlGeometry` property before the data is sent to `SqlBulkCopy`,eg `var geoData=data.Select(r=>new {r.Field1, r.Field2,....,Geometry= SqlGeometry.STGeomFromText(r.GeoField)}; var reader=ObjectReader.Create(geoData); ... bcp.WriteToServer(geoData);`. – Panagiotis Kanavos Jul 14 '21 at 13:05
  • @PanagiotisKanavos I edited the question, using SqlGeometry is not an option, the class i generated using scaffolding and the geometry is of type Geometry (from NettopologySuite) – Johan Jul 14 '21 at 13:10
  • @Johan why are you using NTS for a *bulk import*? Where does the data come from? If you have to perform multiple complex transformations you'll end up with slow performance *again*. Serializing an NTS geometry to *binary* means you a) had to parse the input to NTS first, then serialize it back to binary and b) you could have used a `byte[]` property from the start. – Panagiotis Kanavos Jul 14 '21 at 13:14
  • @PanagiotisKanavos I need to have them as DBSets, since I want to be able to do other EF-related oprations later on single rows. And I don't think having two models of each class that has a geometry column is a good sulotion – Johan Jul 14 '21 at 13:15
  • @Johan assuming eg the data came from NpgSQL, you still need just a `Select`, in this case calling `... Geometry=byteWriterGeometry.Write(x.GeoField)`. You still avoid storing the entire data set in memory – Panagiotis Kanavos Jul 14 '21 at 13:16
  • @Johan there are no classes in an ETL job. The delay is caused by using EF Core in the first place. ORMs are *not* meant for ETL jobs. There are no entities to map there, only Tables, Rows, Fields and Transformations. Anyway, even if you use EF Core, you *still* need to convert the data into either a DataTable, thus duplicating the data, or an IDataReader, which just reads the existing data – Panagiotis Kanavos Jul 14 '21 at 13:17
  • @PanagiotisKanavos The data comes from another sql server database and is allready in the NTS format when fetched from there (also using EF Core) – Johan Jul 14 '21 at 13:19
  • @Johan why are you using EF Core *at all* instead of SSIS or direct queries then? No, the data isn't in NTS form it's SQL Sever geometry values. It doesn't have to be mapped. By the time you load the source data you could have already copied half the data to the destination. That's why ORMs aren't used for ETL. Never mind that loading everything in RAM doesn't scale – Panagiotis Kanavos Jul 14 '21 at 13:21
  • @PanagiotisKanavos I know I still need to convert it to a DataTable, but it will still save loads of time compared to EFs built in SaveChanges(). Going away from using EF Core entities and NTS now would be a major rebuild of the entire system. Hence, a way to use SqlBulkCopy with NTS Geometry is what I'm looking for. – Johan Jul 14 '21 at 13:24
  • @Johan The duplicates show how to do that. All you need is to save the NTS bytes to a `byte[]` field. You don't need a DataTable – Panagiotis Kanavos Jul 14 '21 at 13:25
  • As for `a major rebuild of the entire system.` you're talking about `var reader=cmd1.ExecuteReader(); ... bcp.WriteToServer(reader);`. You can already retrieve an `IDataReader` from the source. Yes, you do need a rebuild because the wrong tool was used (a pickaxe when you needed a tunnel borer), but doing this properly is *several* orders of magnitude faster. There's a reason DBAs don't use ORMs – Panagiotis Kanavos Jul 14 '21 at 13:29
  • @PanagiotisKanavos We do a lot of transformations like cutting the geometries and such in C# between fetching them and saving them. Doing that logic in an SSIS package or in SQL querys would be a massive pain to both write and later to manage, if even possible. – Johan Jul 14 '21 at 13:31
  • 2
    @PanagiotisKanavos, why close? – Svyatoslav Danyliv Jul 14 '21 at 13:38
  • @Johan you missed the point. You can load the data in a `List<>` and process it. It's inefficient, but possible. It's *EF Core* that offers nothing here. As for `if even possible` ETL services like SSIS were created for exactly this reason in the 1990s. ORMs were *never* meant for such jobs. You're dealing with a stream of data that needs to be transformed, not a graph of objects. If anything, the venerable DataTable is **far better** than EF Core for in-memory transformations. Adding or modifying columns is cheap. Transformations, while not cheap, are a lot cheaper. – Panagiotis Kanavos Jul 14 '21 at 14:36
  • @PanagiotisKanavos I don't doubt that there are more efficient ways to do this and neither that converting to DataTables is a waste of time and memory. This is a big data processing system in production with hundred thousands if not millions of lines of code. We're fetching data from 4 different databases and do a lot of calculations and geometry editing based on many attributes from many tables. Rewriting it to an SSIS Service would take years. Going away from EF Core would also take a massive amount of time. I'm not looking for the 100 % most performant solution... – Johan Jul 14 '21 at 14:47
  • ... If we can get 50 or 70 % there with not too much effort that would be more than enough. I believe Jonathan provided such an answer. – Johan Jul 14 '21 at 14:47
  • @Johan that answer bypasses EF Core completely and creates a DataTable with a copy of the data. The library code on Github, and the answer copies just that code. I had to look it up to answer another question about bulk imports yesterday. So no matter what you do, you can't use EF Core with SqlBulkCopy. You have to use ADO.NET classes – Panagiotis Kanavos Jul 14 '21 at 14:56
  • @PanagiotisKanavos His answer converts from NTS (which is supported by EF Core) to something a DataTable can handle which is exactly what I was looking for but hadn't found as clearly written elsewhere. We were allready converting our other tables (which didn't have any geometries) to datatables for bulkinsert, so this way would be the least effort. Most of the examples for SqlBulkCopy that I've found uses DataTables, so maybe that's the reason why people tend to go that way. – Johan Jul 14 '21 at 15:13

3 Answers3

3

Disclaimer: I'm one of the creators of linq2db and extension linq2db.EntityFrameworkCore

linq2db itself has no dependencies to NetTopologySuite so library should know how to convert such types. Configure them ONCE per application (I hope covered everything):

var writer = new NetTopologySuite.IO.SqlServerBytesWriter() { IsGeography = true };

MappingSchema.Default.SetConverter<Point, DataParameter>(p => new DataParameter(null, writer.Write(p), DataType.Udt));
MappingSchema.Default.SetConverter<Polygon, DataParameter>(p => new DataParameter(null, writer.Write(p), DataType.Udt));
MappingSchema.Default.SetConverter<GeometryCollection, DataParameter>(p => new DataParameter(null, writer.Write(p), DataType.Udt));
MappingSchema.Default.SetConverter<LinearRing, DataParameter>(p => new DataParameter(null, writer.Write(p), DataType.Udt));
MappingSchema.Default.SetConverter<LineString, DataParameter>(p => new DataParameter(null, writer.Write(p), DataType.Udt));
MappingSchema.Default.SetConverter<MultiLineString, DataParameter>(p => new DataParameter(null, writer.Write(p), DataType.Udt));
MappingSchema.Default.SetConverter<MultiPoint, DataParameter>(p => new DataParameter(null, writer.Write(p), DataType.Udt));
MappingSchema.Default.SetConverter<MultiPolygon, DataParameter>(p => new DataParameter(null, writer.Write(p), DataType.Udt));

Then you can use BulkCopy for any entities with any Geometry property:

context.BulkCopy(someEntities);
Svyatoslav Danyliv
  • 21,911
  • 3
  • 16
  • 32
  • Went with the DataTable sulotion for now, but this is definitely something to look into in the future. – Johan Jul 15 '21 at 07:30
2

Disclaimer: I'm the owner of Entity Framework Extensions

As you probably know inserting data into a table the "Entity Framework"-way is incredibly slow

That's true and this is the main reason why we created our library (paid library).

Our library support context.BulkInsert through Entity Framework and supports SQL Geometries as well.


That being said, this is also possible to do it directly through SqlBulkCopy.

For EF Core, you need to convert your value using a SqlServerBytesWriter.

Here is a full example:

var list = // your list

var byteWriterGeometry = new NetTopologySuite.IO.SqlServerBytesWriter();
var byteWriterGeographgy = new NetTopologySuite.IO.SqlServerBytesWriter() { IsGeography = true };

var dt = new DataTable();
dt.Columns.Add("Geometry", typeof(object));
dt.Columns.Add("Point", typeof(object));

list.ForEach(x =>
{
    dt.Rows.Add(byteWriterGeometry.Write(x.Geometry), byteWriterGeographgy.Write(x.Point));
});

var connection = new SqlConnection("your connection string");

connection.Open();

var bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "your table name";

bulkCopy.ColumnMappings.Add("Geometry", "Geometry");
bulkCopy.ColumnMappings.Add("Point", "Point");              

bulkCopy.WriteToServer(dt);
Jonathan Magnan
  • 10,874
  • 2
  • 38
  • 60
  • Instead of creating a clone of the data in a DataTable, it's better to use FastMember's `ObjectReader` to created an IDataReader over the data. Using a `DataTable` is impractical for lots of data and wastes time and memory generating the duplicate instead of sending the data to the database – Panagiotis Kanavos Jul 14 '21 at 12:50
  • PS that's a problem with the package as well – Panagiotis Kanavos Jul 14 '21 at 12:51
  • Thanks Jonathan, I'll give that a shot and get back here. And thanks a lot for giving an alternative to the library despite you being the author. – Johan Jul 14 '21 at 13:00
0

The key for bulk insert with NetTopologySuite entities (of type geometry, geography) is:

  • Set your datatable's column type to object (or your entity's property type to object)
  • dtTable.Columns.Add("Shape", typeof(Object)); //force object
  • Use the NetTopologySuite.IO.SqlServerBytesWriter() object to convert your shape to a byte array (byte[]) and set your object column / property to this byte array
  • var byteWriterGeometry = new NetTopologySuite.IO.SqlServerBytesWriter(); dataRow["geometry"] = byteWriteGeometry.Write(shape);

Here's a dotnetfiddle using the fantastic Z.BulkOperations library which demonstrates this: https://dotnetfiddle.net/DZiQJG