0

Im using Elasticsearch with Logstash. I want to update indexes when database changes. So i decided to use LS schedule. But every 1 minute output appended by database table records. Example: contract table has 2 rows. First 1 minute total: 2, 1 minute after total output is : 4; How can i solve this?

There is my config file. Command is bin/logstash -f contract.conf

input {
        jdbc {
            jdbc_connection_string => "jdbc:postgresql://localhost:5432/resource"
            jdbc_user => "postgres"
            jdbc_validate_connection => true
            jdbc_driver_library => "/var/www/html/iltodgeree/logstash/postgres.jar"
            jdbc_driver_class => "org.postgresql.Driver"
            statement => "SELECT * FROM contracts;"
            schedule => "* * * * *"
            codec => "json"
        }
    }

    output {
        elasticsearch {
            index => "resource_contracts"
            document_type => "metadata"
            hosts => "localhost:9200"
        }
    }

1 Answers1

0

You need to modify your output by specifying the document_id setting and use the ID field from your contracts table. That way, you'll never get duplicates.

output {
    elasticsearch {
        index => "resource_contracts"
        document_type => "metadata"
        document_id => "%{ID_FIELD}"
        hosts => "localhost:9200"
    }
}

Also if you have an update timestamp in your contracts table, you can modify the SQL statement in your input like below in order to only copy the records that changed recently:

        statement => "SELECT * FROM contracts WHERE timestamp > :sql_last_value;"
Val
  • 207,596
  • 13
  • 358
  • 360