0

On our project we're using NEST to insert data into ElasticSearch (1.7). We'd like to be able to force ES to truncate all dates towards the mapped format.

Mapping example:

"dateFrom" : { 
  "type": "date",
  "format": "dateHourMinute" // Or yyyy-MM-dd'T'HH:mm
}

Data example:

{
  "dateFrom" : 2015-12-21T15:55:00.000Z
}

Inserting this data throws an IllegalArgumentException:

Invalid format: "2015-12-21T15:55:00.000Z" is malformed at ":00.000Z"

Obviously we don't need the last part of the date. Can't we configure ES to just truncate it instead of erroring out?

Keep in mind we're using 1.7 right now, since date formatting seems to have changed in recent versions...

shmow
  • 622
  • 1
  • 8
  • 19
  • 1
    did you have a look to https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html. There seems to be a date_time format they may fit your data. – Guy Bouallet Dec 22 '15 at 15:24
  • 1
    What are you trying to accomplish? Do you want to strip off the time or are you attempting to parse the date just get the data in there? There are different formats that will accept this type but I'm not clear on how you are contributing data. Are you trying to standardize the format pre- indexing? Will the date type always be in this format or is this an outlier and dates are contributed in various formats? – rlcrews Dec 22 '15 at 15:48
  • The input accepts multiple types of data from files (edifact). Some we've written our own parser for and an XML one where we're using XSD's to generate classes. The problem is the XSD classes force full DateTimes, which include seconds and fractions. So NEST will save dates like "2015-12-21T15:55:57.123Z". This messes with our aggregations as they should only go down to minutes. We could probably write a wrapper to catch the XML-classes and cut their dates down to minutes (ex. "2015-12-21T15:55"), but letting ES auto-truncate seems way easier. I'm also curious if ES can do this at all... – shmow Dec 23 '15 at 08:04

2 Answers2

1

In order to get the data to index correctly I could change the data type to date_optional_time (supported in 1.7)

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "date": {
          "type":   "date",
          "format": "date_optional_time"
        }
      }
    }
  }
}

This will allow you to contribute date with time being optional.

such as:

PUT /my_index/my_type/1
{
   "date": "2015-12-21"
}

or as you have it

PUT /my_index/my_type/2
{
   "date": "2015-12-21T15:55:00.000Z"
}

Both are now valid submissions. I don't know of any transformation approaches within ES to support a truncation or transformation of field data at time of index. I would think if you want to parse the data and remove the time pre-submission you will need to do that outside of ES when you create the JSON object.

rlcrews
  • 3,482
  • 19
  • 66
  • 116
0

It appears ES is currently not capable of editing dates through a custom mapping. We ended up using JsonConverters (like this) to drop seconds and millis before inserting them into ES.

Community
  • 1
  • 1
shmow
  • 622
  • 1
  • 8
  • 19