I have been given a 15 GB .txt file that is formatted like this:
{
"_score": 1.0,
"_index": "newsvit",
"_source": {
"content": " \u0641\u0647\u06cc\u0645\u0647 \u062d\u0633\u0646\u200c\u0645\u06cc\u0631\u06cc: ",
"title": "\u06a9\u0627\u0631\u0647\u0627\u06cc \u0642\u0627\u0644\u06cc\u0628\u0627\u0641 ",
"lead": "\u062c\u0627\u0645\u0639\u0647 > \u0634\u0647\u0631\u06cc -
\u0645\u06cc\u0632\u06af\u0631\u062f\u06cc \u062f\u0631\u0628\u0627\u0631\u0647 .",
"agency": "13",
"date_created": 1494518193,
"url": "http://www.khabaronline.ir/(X(1)S(bud4wg3ebzbxv51mj45iwjtp))/detail/663749/society/urban",
"image": "uploads/2017/05/11/1589793661.jpg",
"category": "15"
},
"_type": "news",
"_id": "2981643"
}
{
"_score": 1.0,
"_index": "newsvit",
"_source": {
"content": "\u0645/\u0630",
"title": "\u0645\u0639\u0646\u0648\u06cc\u062a \u062f\u0631 \u0639\u0635\u0631 ",
"lead": "\u0645\u062f\u06cc\u0631 \u0645\u0624\u0633\u0633\u0647 \u0639\u0644\u0645\u06cc \u0648 \u067e\u0698\u0648\u0647\u0634\u06cc \u0627\u0628\u0646\u200c\u0633\u06cc\u0646\u0627 \u062f\u0631 .",
"agency": "1",
"date_created": 1494521817,
"url": "http://www.farsnews.com/13960221001386",
"image": "uploads/2017/05/11/1713799235.jpg",
"category": "20"
},
"_type": "news",
"_id": "2981951"
}
....
and I want to import it into elasticsearch. I have tried BulkAPI, but since it only accepts a specific style of JSON, I cant convert the whole 15 GB file into Bulk format. I also tried logstash but then the fields like content
wouldn't be searchable and queryable.
Whats the most efficient way of importing this file into elasticsearch?