1

i'm reading data from elasticsearch . when i visualize date format on kibana it shows Date Aug 5, 2020 @ 23:00:00.000 => so it's correct but when i'm reading it from elasticsearch to do some machine learning , i noticed that date format is taken wrong Date 1.596665e+12

i'm collecting the content of the index into a dataframe with pyspark and i can do it in scala if there's any solution

`from elasticsearch import Elasticsearch from pandasticsearch import Select

es = Elasticsearch(['http://localhost:9200'],timeout=600) documents = es.search(index='sub1',body={})

pandas_df = Select.from_dict(documents).to_pandas() print(pandas_df) '

it shows wrong date format , so how can i solve it ? any help ? thank u

sam
  • 13
  • 4

1 Answers1

1

1.596665e+12 equals 1596665000000 which is a unix millisecond timestamp corresponding to Wednesday, August 5, 2020 10:03:20 PM in GMT.

You've essentially got 3 options:

  1. Use a script_field to parse/convert the ts to a human-readable date. Note that you'll need to extract the script fields from the response b/c they're not part of the _source.

  2. Convert the timestamps after you fetch the docs but before you load them into a df (preferably in a loop/list comprehension/map).

  3. Reindex your data w/ already converted timestamps. This can be done from inside of an _update script, you don't need to drop everything.

UPDATE

Implementation of point 2

from elasticsearch import Elasticsearch
from datetime import datetime as dt


def convert_ts(hit):
    hit = hit['_source']

    try:
        ts_from_doc = hit.get('date_field', None)

        if not ts_from_doc:
            raise ValueError('`date_field` not found')

        # incoming as millisec so convert to sec
        as_date = dt.fromtimestamp(
            int(ts_from_doc / 1000.0)
        ).strftime('%Y-%m-%d %H:%M:%S')

        hit['date_field_as_date'] = as_date

    except Exception as e:
        print(e)

    return hit


es = Elasticsearch(['http://localhost:9200'], timeout=600)
documents = es.search(index='sub1', body={})['hits']['hits']
documents = [convert_ts(doc) for doc in documents]

print(documents)

# pandas etc ...

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • import datetime print( datetime.datetime.fromtimestamp( int("1284105682") ).strftime('%Y-%m-%d %H:%M:%S') ) i found this but i don't know how to apply it on many columns cuz it's not a single variable i need to modify 6 columns .... !! help – sam Jun 11 '20 at 17:52
  • Updated my answer. Good luck! – Joe - GMapsBook.com Jun 11 '20 at 18:17