Scenario: I want to write my own Autocomplete-API for Addresses, just like the One Google is offering. (Very Basic: Street, Housenumber, City, Postcode, Country). It is intended for private use and training-purposes only. I want to cover about 1 Million Addresses for a Start.
Technology Used: .Net Framework (not Core), C#, Visual Studio, OSMSharp, Microsoft SQL-Server, Web Api 2 (although i will probably switch to ASP.Net Core in the Future.)
Approach:
- Set Up Project (Web Api 2 or Console Project for Demo-Purposes)
- Download relevant File from OpenStreetMaps using DownloadClient() (https://download.geofabrik.de/)
- Read in the File using OSMSharp and Filter out relevant Data.
- Convert Filtered Data to a DataTable.
- Use DataTable to feed SQLBulkCopy Method to import Data into Database.
Problem: Step 4 is taking way too long. For a File like "Regierungsbezirk Köln" in the Format osm.pbf which is about 160MB (the uncompressed osm file is about 2.8 GB) where talking about 4-5 Hours. I want to optimize this. The Bulk Copy of the DataTable into the Database on the other Hand (About 1 Million Rows) is taking just about 5 Seconds. (Woah. Amazing.)
Minimal Reproduction: https://github.com/Cr3pit0/OSM2Database-Minimal-Reproduction
What i tried:
Use a Stored Procedure in SQL-Server. This comes with a whole different Set of Problems and i didn't quite manage to get it Working (mainly because the uncompressed osm.pbf File is over 2GB and SQL Server doesn't like that)
Come up with a different approach to Filter and Convert the Data from the File to a DataTable (or CSV).
Use the Overpass-API. Although I read somewhere that the Overpass-API is not intended for DataSets above 10,000 Entries.
Ask the Jedi-Grandmasters on StackOverflow for Help. (Currently in Process ... :D)
Code Extract:
public static DataTable getDataTable_fromOSMFile(string FileDownloadPath)
{
Console.WriteLine("Finished Downloading. Reading File into Stream...");
using (var fileStream = new FileInfo(FileDownloadPath).OpenRead())
{
PBFOsmStreamSource source = new PBFOsmStreamSource(fileStream);
if (source.Any() == false)
{
return new DataTable();
}
Console.WriteLine("Finished Reading File into Stream. Filtering and Formatting RawData to Addresses...");
Console.WriteLine();
DataTable dataTable = convertAdressList_toDataTable(
source.Where(x => x.Type == OsmGeoType.Way && x.Tags.Count > 0 && x.Tags.ContainsKey("addr:street"))
.Select(Address.fromOSMGeo)
.Distinct(new AddressComparer())
);
return dataTable;
}
};
private static DataTable convertAdressList_toDataTable(IEnumerable<Address> addresses)
{
DataTable dataTable = new DataTable();
if (addresses.Any() == false)
{
return dataTable;
}
dataTable.Columns.Add("Id");
dataTable.Columns.Add("Street");
dataTable.Columns.Add("Housenumber");
dataTable.Columns.Add("City");
dataTable.Columns.Add("Postcode");
dataTable.Columns.Add("Country");
Int32 counter = 0;
Console.WriteLine("Finished Filtering and Formatting. Writing Addresses From Stream to a DataTable Class for the Database-SQLBulkCopy-Process ");
foreach (Address address in addresses)
{
dataTable.Rows.Add(counter + 1, address.Street, address.Housenumber, address.City, address.Postcode, address.Country);
counter++;
if (counter % 10000 == 0 && counter != 0)
{
Console.WriteLine("Wrote " + counter + " Rows From Stream to DataTable.");
}
}
return dataTable;
};