Importing nested JSON documents into Elasticsearch and making them searchable

Question

We have MongoDB-collection which we want to import to Elasticsearch (for now as a one-off effort). For this end, we have exported the collection with monogexport. It is a huge JSON file with entries like the following:

    {
     "RefData" : {
      "DebtInstrmAttrbts" : {
        "NmnlValPerUnit" : "2000",
        "IntrstRate" : {
          "Fxd" : "3.1415"
        },
        "MtrtyDt" : "2020-01-01",
        "TtlIssdNmnlAmt" : "200000000",
        "DebtSnrty" : "SNDB"
      },
      "TradgVnRltdAttrbts" : {
        "IssrReq" : "false",
        "Id" : "BMTF",
        "FrstTradDt" : "2019-04-01T12:34:56.789"
      },
      "TechAttrbts" : {
        "PblctnPrd" : {
          "FrDt" : "2019-04-04"
        },
        "RlvntCmptntAuthrty" : "GB"
      },
      "FinInstrmGnlAttrbts" : {
        "ClssfctnTp" : "DBFNXX",
        "ShrtNm" : "AVGO  3.625  10/16/24 c24 (URegS)",
        "FullNm" : "AVGO 3  5/8  10/15/24 BOND",
        "NtnlCcy" : "USD",
        "Id" : "USU1109MAXXX",
        "CmmdtyDerivInd" : "false"
      },
      "Issr" : "549300WV6GIDOZJTVXXX"
    }

We are using the following Logstash configuration file to import this data set into Elasticsearch:

input {
  file {
    path => "/home/elastic/FIRDS.json"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => json
  }
}
filter {
  mutate {
    remove_field => [ "_id", "path", "host" ]
  }
}
output {
  elasticsearch {
     hosts => [ "localhost:9200" ]
     index => "firds"
  }
}

All this works fine, the data ends up in the index firds of Elasticsearch, and a GET /firds/_search returns all the entries within the _source field. We understand that this field is not indexed and thus is not searchable, which we are actually after. We want make all of the entries within the original nested JSON searchable in Elasticsearch.

We assume that we have to adjust the filter {} part of our Logstash configuration, but how? For consistency reasons, it would not be bad to keep the original nested JSON structure, but that is not a must. Flattening would also be an option, so that e.g.

"RefData" : {
      "DebtInstrmAttrbts" : {
        "NmnlValPerUnit" : "2000" ...

becomes a single key-value pair "RefData.DebtInstrmAttrbts.NmnlValPerUnit" : "2000".

It would be great if we could do that immediately with Logstash, without using an additional Python script operating on the JSON file we exported from MongoDB.

EDIT: Workaround Our current work-around is to (1) dump the MongoDB database to dump.json and then (2) flatten it with jq using the following expression, and finally (3) manually import it into Elastic

ad (2): This is the flattening step:

jq  '. as $in | reduce leaf_paths as $path ({}; . + { ($path | join(".")): $in | getpath($path) })  | del(."_id.$oid") ' 
    -c dump.json > flattened.json

References

Remark for the curious: The shown JSON is a (modified) entry from the Financial Instruments Reference Database System (FIRDS), available from the European Securities and Markets Authority (ESMA) who is an European financial regulatory agency overseeing the capital markets.

Have you tried the `json` filter already? https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html — leandrojmp, May 24 '19 at 13:40
@leandrojmp We are struggling with finding the correct way of doing that. We found https://stackoverflow.com/questions/22941739/using-json-with-logstash which gives examples (which however did not work). — B--rian, May 27 '19 at 10:57

Importing nested JSON documents into Elasticsearch and making them searchable

0 Answers0

Linked