SQLalchemy: automatically apply filters on columns

Question

I'd like to store larger amounts of python float timestamps as standard postgresql timestamps using a custom SQLalchemy data type. I saw possible solutions like Defining a table with sqlalchemy with a mysql unix timestamp . What I don't like about it, is that it's rather inefficient, as it's converting a float into a time object, and back into a timestamp, while this should simply be a "multiply by 1e6, add a constant and cast to int"-conversion, which should timewise be close to a noop. While

%timeit float(calendar.timegm(datetime.datetime.utcfromtimestamp(1542098001).timetuple()))

wastes something like 2us per call on my machine. Using postgresql to_timestamp and extract(epoch from timestamp_col) should be way faster and better suited for analytical workloads. Is there a way in SQLalchemy, that can apply those conversions on SQL level automatically for all statements affecting that column? I also considered using a simple float field for storing, but I'd prefer to be able to use the time functions for accessing the data.

import time
from datetime import datetime
from calendar import timegm
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import Column, String, Integer, DateTime, TypeDecorator, create_engine


class Timestamp(TypeDecorator):
    impl = DateTime

    def process_bind_param(self, value, dialect):
        if value is None:
            return
        elif isinstance(value, datetime):
            return value
        elif isinstance(value, (int, float)):
            return datetime.utcfromtimestamp(value)
        raise TypeError("Any form of time object required, but received {}".format(type(value).__name__))

    def process_result_value(self, value, dialect):
        if value is None:
            return None # support nullability
        elif isinstance(value, datetime):
            return float(timegm(value.timetuple()))
        raise vn.ValidationTypeError("datetime object required, but received {}".format(type(value).__name__))


Base = declarative_base()

class SystemLog(Base):
    __tablename__ = 'systemlog'
    id = Column(Integer, primary_key=True)
    time = Column(Timestamp, index=True, nullable=False, default=datetime.utcnow)
    type = Column(String(20))
    message = Column(String(2000))


engine = create_engine('sqlite://')
Session = sessionmaker(bind=engine)

s = Session()
Base.metadata.create_all(engine)
s.commit()

start_time = time.time()
sample_data = [SystemLog(time=start_time + i) for i in xrange(10000)]
s.add_all(sample_data)
s.commit()

https://stackoverflow.com/questions/38357352/convert-datetime-to-unix-timestamp-in-sqlalchemy-model-before-executing-query Does this work? Its a pretty good solution where you'll work with `datetime`s but store a `Double`. — Pruthvi Kumar, Nov 20 '18 at 06:30
I came across that one. As I wrote, I'm working with doubles internally already, so just storing them as floats would be no problem. But then I'd be losing the option to nicely use all the inbuilt features for timestamps, so that's why I'd like to avoid that. — Michael, Nov 20 '18 at 16:51
Are you using the ORM or Core? Could you perhaps include some sample code of a usual session? — Ilja Everilä, Nov 22 '18 at 11:52
Looks like I somewhat overestimated that effect. While it's noticeable with sqlite and about 17% slower, with postgresql all the other overhead makes it neglectable in terms of timeing. Mb I should just keep it as is. — Michael, Nov 22 '18 at 13:36

SQLalchemy: automatically apply filters on columns

0 Answers0