Every time I Google this, I find the "river" approach which is deprecated. I'm using Dapper if this is somehow a helpful information.
So what's the solution for this these days?
Every time I Google this, I find the "river" approach which is deprecated. I'm using Dapper if this is somehow a helpful information.
So what's the solution for this these days?
Your question is on the broad side - so this is a pointer to some options.
Elastic search is used to query the database and analyse the data.
In the article Deprecating Rivers:
Client libraries
For more than a year, we've had official client libraries for Elasticsearch in most programming languages. It means that hooking into your application and getting data through an existing codebase should be relatively simple. This technique also allows to easily munge the data before it gets to Elasticsearch. A common example is an application that already used an ORM to map the domain model to a database, and hooking and indexing the domain model back to Elasticsearch tends to be simple to implement.
There's extensive documentation on how to use elastic search in:
The docs will address the following:
Install the package:
PM> Install-Package Elasticsearch.Net
var node = new Uri("http://mynode.example.com:8082/apiKey");
var config = new ConnectionConfiguration(node);
var client = new ElasticsearchClient(config);`
This is what you'll need to develop.
Plugins
Logstash can also be used instead of Rivers, from which various plugins have been developed.
Also, Logstash, or similar tools, can be used to ship data into Elasticsearch. For example, some of the rivers Elasticsearch came with are now implemented as Logstash plugins (like the CouchDB one) in the forthcoming Logstash 1.5.
Extra reading
Although this is a different language and framework - the blog Advanced Search for Your Legacy Application by David Pilato and information may be helpful to browse. He recommends doing it in the application layer.
To address issues from the comments.
SQL Server provides an inbuilt system to track data changes, an effective means of automatically tracking changes to the data without having to implement manual methods to check for changes.
There's two means by which to acheive this:
Using Change Data Capture:
Data changes are tracked with timestamps. The history of the data changes can be tracked.
Change data capture provides historical change information for a user table by capturing both the fact that DML changes were made and the actual data that was changed. Changes are captured by using an asynchronous process that reads the transaction log and has a low impact on the system.
Using Change Tracking:
This has less overheads, but does not keep track of historical changes. The latest changes are kept, but no further back.
Change tracking captures the fact that rows in a table were changed, but does not capture the data that was changed. This enables applications to determine the rows that have changed with the latest row data being obtained directly from the user tables. Therefore, change tracking is more limited in the historical questions it can answer compared to change data capture. .../...
You can use Logstash to do the job. Simply use the logstash JDBC plugin to setup a logstash pipeline. Follow this link :-Migrate MySQL data to ElasticSearch
Also checkout this repo on GitHub ElasticSearchCRUD
So, just my 2¢ on implementation of this. In the past I've done this by setting a trigger to write to a buffer table that acted as an event-log. Then I had a serverless function (AWS Lambda) on a timer that would come clear out that event-log and push the needed changes into ES. This way I didn't have to do anything crazy in the trigger or really even change my original code.
I've come across this post multiple times and feel it needs an updated answer.
For shipping data from a mssql instance into Elasticsearch I use Logstash which is inherent to the ELK stack. You define individual pipe lines and configurations using the jdbc input plug in.
Here is an example config file. This runs a stored procedure every 2 minutes and inserts the data into the correct index. Keep in mind to provide some method to only sync new records of data otherwise you'll have a scaling issue when data becomes large.
input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://${sql_server};database=api;user=<username>;password=<password>;applicationname=logstash"
# The user we want to execute our statement as
jdbc_user => nil
id => "index_name"
jdbc_driver_library => "/var/tmp/mssql-jdbc-6.2.2.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
schedule => "*/2 * * * *"
statement => "exec stored_procedure"
lowercase_column_names => false
}
}
output {
elasticsearch {
"id" => "index_name"
"hosts" => "elasticsearch:9200"
"index" => "index_name"
"document_id" => "%{primary_key}"
}
}
`
Even though the question is asking on syncing from MSSQL
--> ElasticSearch
, i feel that the basic idea to sync across Heterogeneous Systems will be quite same.
You may need to
This article Continuous data sync across Hetereogeneous Systems - YoursAndMyIdeas explains all the details to achieve this in more detail.