High-speed and low-memory way to store a float number inside a Numpy array

Question

I have this number: 19576.4125. I want to save it inside a Numpy array, and I think that while the bit count is lower, it is still better. Is this right?

I tried to save inside a half and a single, but I don't know why it changes the number.

My number: 19576.4125
Half: 19580.0
single: 19576.412

This number is generated by a method that I created to make a datetime go to a float number. I can use timestamp, but I don't need the seconds and milliseconds, so I tried to create my own method that saves only date, hours and minutes. (My database doesn't accept datetimes and timedeltas).

This is my generator method:

from datetime import datetime


def get_timestamp() -> float:
    now = datetime.now()
    now.replace(microsecond=0, second=0)
    _1970 = datetime(1970, 1, 1, 0, 0, 0)
    td = now - _1970
    days = td.days
    hours, remainder = divmod(td.seconds, 3600)
    minutes, second = divmod(remainder, 60)
    timestamp = days + hours / 24 + minutes / 1440
    return round(timestamp, 4)

How I'm creating the array:

from numpy import array, half, single


__td = get_timestamp()
print(__td)
__array = array([__td], dtype=half)
print(type(__array[0]))
print(__array[0])
__array = array([__td], dtype=single)
print(type(__array[0]))
print(__array[0])

EDITED 08/07 11h02 AM

Hello, such the comments said, I think that this number can't be saved in a half or single type. So how I save this number with better performance? Is better save then like a int and multiply by 10000, a float64 or string?

And not, I don't want a better way to sabe datetimes I want a better way to save this float number with better performance. But thank you for te other replies.

It changes the number because when you throw away bits of the number, you're throwing away precision. — Reinderien, Aug 07 '23 at 13:33
This sounds like premature optimisation. Just use `datetime64[m]`, and [study the documentation](https://numpy.org/doc/stable/reference/arrays.datetime.html). — Reinderien, Aug 07 '23 at 13:36
Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — Homer512, Aug 07 '23 at 13:37
@Reinderien so I can't save this number like a float? Because such I said, My database don't accept datetime, so I can save only with timestamps, and the timestamp is unnecessarily biggest. Because this I created this method — DazzRick, Aug 07 '23 at 14:01
@Homer512 Yes, it reply a piece from my doubt. I will edit my question — DazzRick, Aug 07 '23 at 14:02
If you don't need seconds or sub-second accuracy, it sounds to me like your problem can be solved by storing minutes since epoch as an integer. Weird unit but I've seen worse. An Int64 as seconds since epoch will also work and be pretty fast. Just don't use int32 — Homer512, Aug 07 '23 at 14:06
@Reinderien My database is Firebase, the only way to save datetimes is with timestamp — DazzRick, Aug 07 '23 at 14:19
@Homer512 So the better way is save the minutes with int64, instead days in float64? — DazzRick, Aug 07 '23 at 14:21
`np.array([__td])` should be `np.float64`. But if you are creating the floats individually, why store them in a list? — hpaulj, Aug 07 '23 at 14:25
@hpaulj because I will save this data in the database after, I'm already using lists, but I want to learn numpy and pandas, so I'm getting the database and saving in it. The `np.array([__td])` is only representative — DazzRick, Aug 07 '23 at 14:28
I mean, float64 as seconds since epoch also work but if you are not confident in your use of floating point numbers and its rounding behavior, then better stick with integer values. 32 bit is too small (see [Year 2038 Problem](https://en.wikipedia.org/wiki/Year_2038_problem)), so 64 bit is your best bet. Also works in microseconds since epoch (covers ~300,000 years) — Homer512, Aug 07 '23 at 14:36
If I use what you said (minutes), you think that can occur this problem from 2038 when I save in 32bits integers? because when I place the max number from int32 `2_147_483_648`, the date that can occurr problem is: `6053-01-23 02:08:00` — DazzRick, Aug 07 '23 at 14:49
@Homer512 correcting, when I place the max number in the method that convert the minutes in datetime — DazzRick, Aug 07 '23 at 15:05
`(2**31 - 1)` minutes covers about 4000 years.I wouldn't use that for an archaeological database but it sounds fine for most other use cases. But keep in mind that minutes is an uncommon data format. Consider the [principle of least astonishing](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) when making that choice — Homer512, Aug 07 '23 at 15:58

score 1 · Answer 1 · answered Aug 07 '23 at 13:38

I feel like your title question and the body text are different. If your aim is to make a function that makes "datetime go to a float number" and probably back to datetime later, you can use this approach as an alternative:

import datetime
# convert to a float timestamp
ts = datetime.datetime.now().timestamp()
# convert back to datetime format
datetime.datetime.fromtimestamp(ts)

hpaulj · Accepted Answer · 2023-08-07T16:33:49.657

I modified your function to take a timestemp

In [48]: def get_timestamp(now) -> float:
    ...:     #now = datetime.now()
    ...:     now.replace(microsecond=0, second=0)
   ...
    ...:     return round(timestamp, 4)
    ...:

and made a list of dates:

In [49]: alist = [datetime.now() for _ in range(1000)]

In [50]: timeit alist = [datetime.now() for _ in range(1000)]
885 µs ± 2.27 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

And timed your function, to make an array:

In [51]: arr = np.array([get_timestamp(d) for d in alist])

In [52]: timeit arr = np.array([get_timestamp(d) for d in alist])
7.7 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [53]: arr.nbytes
Out[53]: 8000

and did the same, but using numpy's own conversion to an 8 byte element:

In [54]: barr = np.array(alist,dtype='datetime64[m]')

In [55]: barr.nbytes
Out[55]: 8000

In [56]: timeit barr = np.array(alist,dtype='datetime64[m]')
7.87 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So basically the same conversion time. So from calculation and memory your function is just as good.

Saving as a 4 byte element (float or int) would cut the memory use, but unless you are hitting memory errors with millions of values, this effort is rarely worth it.

datetime64 has already worked out the conversion both ways. I imagine the interface to pandas is also good, though pandas appears to have its own datetime formats and tricks. After all, it's designed to handle timeseries.

pandas

In [64]: import pandas as pd

In [65]: df = pd.DataFrame({'a':arr, 'b':barr})

In [66]: df
Out[66]: 
              a                   b
0    19576.3799 2023-08-07 09:07:00
1    19576.3799 2023-08-07 09:07:00
2    19576.3799 2023-08-07 09:07:00
3    19576.3799 2023-08-07 09:07:00
4    19576.3799 2023-08-07 09:07:00
..          ...                 ...
995  19576.3799 2023-08-07 09:07:00
996  19576.3799 2023-08-07 09:07:00
997  19576.3799 2023-08-07 09:07:00
998  19576.3799 2023-08-07 09:07:00
999  19576.3799 2023-08-07 09:07:00

[1000 rows x 2 columns]

In [67]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype        
---  ------  --------------  -----        
 0   a       1000 non-null   float64      
 1   b       1000 non-null   datetime64[s]
dtypes: datetime64[s](1), float64(1)
memory usage: 15.8 KB

Interesting if I save the timestamp list directly to a dataframe, it's faster

In [81]: df = pd.DataFrame({'c':alist})

In [82]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   c       1000 non-null   datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 7.9 KB

In [83]: timeit df = pd.DataFrame({'c':alist})
5.29 ms ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

High-speed and low-memory way to store a float number inside a Numpy array

2 Answers2

pandas