4
  • I currently have a UNIX timestamp as a 64-bit float. It's seconds with a few fractions as a float. Such as 1687976937.597064.
  • I need to convert it to nanoseconds. But there's 1 billion nanoseconds in 1 second. And doing a straight multiplication by 1 billion would overflow the 64-bit float.

Let's first consider the limits:

  • 1_687_976_937_597_064_000 is the integer result of the above timestamp multiplied by 1 billion. The goal is figuring out a way to safely reach this number.
  • 9_223_372_036_854_775_807 is the maximum number storable in a 64-bit signed integer.
  • 9_007_199_254_740_992.0 is the maximum number storable in a 64-bit float. And at that scale, there aren't enough bits to store any decimals at all (it's permanently .0). Edit: This claim is not correct. See the end of this post...
  • So 64-bit signed integer can hold the result. But a 64-bit float cannot hold the result and would overflow.

So I was thinking:

  • Since an integer is able to easily represent the result, I thought I could first convert the integer portion to an integer, and multiply by 1 billion.
  • And then extract just the decimals so that I get a new 0.XXXXXX float, and then multiply that by 1 billion. By leading with a zero, I ensure that the integer portion of the float will never overflow. But perhaps the decimals could still overflow somehow? Hopefully floats will just safely truncate the trailing decimals instead of overflowing. By multiplying a 0.X number by 1 billion, the resulting value should never be able to be higher than 1_999_999_999.XXXXX so it seems like this multiplication should be safe...
  • After that, I truncate the "decimals float" into an integer to ensure that the result will be an integer.
  • Lastly, I add together the two integers.

It seems to work, but this technique looks so hacky. Is it safe?

Here's a Python repl showing the process:

>>> num = 1687976937.597064
>>> whole = int(num)
>>> whole
1687976937
>>> decimals = num - whole
>>> decimals
0.5970640182495117
>>> (whole * 1_000_000_000)
1687976937000000000
>>> (decimals * 1_000_000_000)
597064018.2495117
>>> int(decimals * 1_000_000_000)
597064018
>>> (whole * 1_000_000_000) + int(decimals * 1_000_000_000)
1687976937597064018
>>> type((whole * 1_000_000_000) + int(decimals * 1_000_000_000))
<class 'int'>

So here's the comparison:

  • 1_687_976_937_597_064_018 was the result of the above algorithm. And yes, there's a slight, insignificant float rounding error but I don't mind.
  • 1_687_976_937_597_064_000 is the scientifically correct answer given by Wolfram Alpha's calculator.

It certainly looks like a success, but is there any risk that my algorithm would be dangerous and break?

I am not brave enough to put it into production without confirmation that it's safe.


Concerning the 64-bit float limits: Here are the results in Python 3's repl (pay attention to the 993 input and the 992 in the output):

>>> 9_007_199_254_740_993.0
9007199254740992.0

But perhaps I am reading that "limit" incorrectly... Perhaps this is just a float rounding error.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Mitch McMabers
  • 3,634
  • 28
  • 27
  • Do you know in advance the range the "input" will cover? Is speed a hard requisite? Can you use external, specialized, libs? – Ripi2 Jun 28 '23 at 19:11
  • 5
    Multiplying by 1 billion will *not* overflow a 64-bit float, nor will you lose precision. Remember, it's floating point. It will do a multiply, and the exponent will increase, but the number of significant bits should be unchanged. – Tom Karzes Jun 28 '23 at 19:14
  • 4
    "would overflow the 64-bit float" It will not. The range of `double` is about 10^369. The age of the Universe in nanoseconds is about 10^27. "9_007_199_254_740_992.0 is the maximum number storable in a 64-bit float" This is not true. – n. m. could be an AI Jun 28 '23 at 19:14
  • @Ripi2 Yeah, the input is always a UNIX timestamp with decimal precision. That's the number of seconds since 1970, which is currently `1_687_979_659`. So the "whole" portion of the number is growing very slowly. The fractions in the float are just the current fractions of a second. As for speed, I'm trying to solve it with native numbers without any "big math" libraries (since the latter usually use very slow and complex methods of doing the math). – Mitch McMabers Jun 28 '23 at 19:17
  • I wouldn't try taking it apart and pasting it back together. It would be simpler to just store the integer part and the nanoseconds in separate variables. – Robert Dodier Jun 28 '23 at 19:18
  • @TomKarzes See the edit on the question though. I added an example of the "float limit". There's something weird going on when exceeding the number I quoted for the "float limit". It truncates the float down to the number I quoted. – Mitch McMabers Jun 28 '23 at 19:22
  • @RobertDodier Unfortunately I need to generate a 64-bit signed integer with the whole number as nanoseconds, for use with the following POSIX API which takes a timestamp as nanoseconds. It's the only way to preserve the fractions of a second that I have in the input (the old `utime` API truncates the decimals): https://man7.org/linux/man-pages/man2/utimensat.2.html – Mitch McMabers Jun 28 '23 at 19:23
  • 1
    @MitchMcMabers There's nothing unexpected here. You're seeing slight differences in the low-order digit. That's expected. Remember, your numbers are not being represented exactly. When multiplying by 1 billion, the result will be slightly larger or smaller than the "true" result, so you need to expect this sort of thing. Alternatively, don't use floating point. You can use `decimal.Decimal` or `fractions.Fraction` if you want exact multiplication and addition. – Tom Karzes Jun 28 '23 at 19:29
  • Your conversion is `(int64_t)(timestamp_in_seconds * 1e9)`. Nothing more complicated than that. Or in Python, `int(timestamp_in_seconds * 1e9)`. – n. m. could be an AI Jun 28 '23 at 19:31
  • 1
    @TomKarzes Why would anyone need exact multiplication and addition in this application? A `double` timestamp has a microsecond granularity. There is simply no information about the exact number of nanoseconds in there. Why would you perform multiplication with garbage digits? (edit: microsecond not millisecond) – n. m. could be an AI Jun 28 '23 at 19:39
  • 1
    It's best to answer your own question by posting an answer rather than editing the answer in the body of the question. – n. m. could be an AI Jun 28 '23 at 19:45
  • @n.m.willseey'allonReddit It has to do with the POSIX APIs. https://man7.org/linux/man-pages/man2/utime.2.html truncates the timestamp to `floor(seconds)`, so even if I told it "1.5 seconds", it would store "1 second" on disk. https://man7.org/linux/man-pages/man2/utimensat.2.html preserves the fractions of a second but requires that the timestamp is delivered as a signed 64-bit integer with the value as nanoseconds. By converting the "seconds" float to nanoseconds int, it will properly preserve values such as "1.5 seconds" without any significant loss of decimal precision. – Mitch McMabers Jun 28 '23 at 19:52
  • @n.m.willseey'allonReddit I'm not saying it makes sense for this application. I was just explaining how exact multiplication and addition can be performed on fractional values. For this application, 64-bit floats should be sufficient. – Tom Karzes Jun 28 '23 at 20:31
  • [There is no overflow for integers in Python](https://docs.python.org/3/c-api/long.html#integer-objects), though. So surely the answer is "just keep using ints"? No need for floats here in the slightest. – Mike 'Pomax' Kamermans Jun 28 '23 at 20:56
  • @Mike'Pomax'Kamermans That taught me something interesting. To add more details to it: Turns out that Python 3's `int` is a dynamic "big integer" implementation which they call `PyLong`. If it fits in a native int (32/64-bit depending on CPU), it uses that, but if it becomes larger than the CPU can fit, then it switches to the `bignum aka PyLong` format which can store arbitrary numbers and is practically just limited by how much RAM you have. Like all "big integer" libraries it's of course slower than a real int. Very fascinating to learn about though! That fact eliminates a lot of issues. – Mitch McMabers Jun 28 '23 at 21:38
  • Plus it doesn't require any special syntax, all arithmetic operators still work just fine. – Mike 'Pomax' Kamermans Jun 28 '23 at 22:48

3 Answers3

1

The thing when working with 64bit floats is that the number of "trusted" digits is 15. Perhaps the 16th digit is also right, but you can't know.

So when you take (or print) the decimal part of a float anything beyond the 16th digit is garbage.

You say you need C POSIX utimensat. It requires an array of timespec structures, which has two members to hold the number of seconds and the number of nanoseconds.

The first member (seconds) allows any integer type, likely "long". But the type used for nanoseconds ("long") is platform-dependant: It can be a 32 or a 64 signed integer.
So, available range must be watched.

You can see the limits of the types in the C docs.

Fortunately a 32 bit long can hold 9 digits (and an extra 0,1,2 at the beginning). This is enough to print the nanoseconds part (exactly 9 digits).

So far, so good. You can use 32bit integers, no 64bit needed.

The key now is how to split the float into two integers.
One way is transforming the number into a string, and add required zeros for the decimal part. Then take the parts before and after the decimal point, translate them into integers and you are done.

Without strings, the whole part is easy. A simple int(number).
For the decimal part you can substract num-whole. Then multiply by 10^9, convert into a integer and truncate at the right position. This position is exactly after the 9th digit.

Ripi2
  • 7,031
  • 1
  • 17
  • 33
  • This is a great answer so I upvoted it, even though it was unfortunately based on me not being precise enough: Yes I am using the aforementioned `utimensat` API, but I do it via Python's wrapper: [os.utime()](https://docs.python.org/3/library/os.html#os.utime), which takes two 64-bit signed integers: "`(atime_ns, mtime_ns)` where each member is an int expressing nanoseconds." I really apologize for the confusion caused by my imprecise comment. I think your answer is really well thought out and considerate so the least I could do is upvote it. It's a great answer for what it was aimed at. :) – Mitch McMabers Jun 28 '23 at 21:47
1

For files, Unix time stamps are only as accurate as the OS underlying time stamp granularity. This granularity can be 2 seconds, 1 second, milliseconds, microseconds and nanoseconds. If you convert the digits beyond the granularity of the OS (if any) you will get an incorrect nanosecond representation.

Further, a 64 bit float can accurately represent integers in the range of 2^53. Potentially, a nanosecond value could be 64 bits so not all values can be represented. The usual assumption is a decimal representation of a 64 bit float is as accurate as an integer for 15 decimal digits (it can be more, but 15 decimal digits will hold all combinations of digits no matter if it a factor of 2.)

Two ways to get around these issues. Assume milliseconds file stamps, or 1e-3.

You can use strings to get a rounded value as you expect it to be:

def ts2its(ts):
    sec,_,dec=f'{ts:.3f}'.partition('.') # use locale to get the decimal...
    return int(sec)*1_000_000_000+int(dec)*1_000_000

>>> ts2its(1687976937.597064)
1687976937597000000

Alternatively, for a purely math approach, you can use math.modf in the following manner to drop the digits after milliseconds:

def ts2its2(ts):
    # reliable in all locales and platforms
    sec1, dec = math.modf(ts)
    _,sec = math.modf(sec1*1e3)
    return int(dec*1e9)+int(sec*1e6)

>>> ts2its2(1687976937.597064)
1687976937597000000

You propose this in your own answer as the solution:

>>> ts=1687976937.597064
>>> int(ts*1_000_000_000)
1687976937597063936

Granted, 99% of the time this is a fine solution. But let's talk about the 1% of the time when it is not.

Let's start by looking at your example time:

 1687976937.597064
 SSSSSSSSSS          Significant under 1 second granularity
 SSSSSSSSSS SSS      Significant under millisecond granularity
 SSSSSSSSSS SSSSS?   The 16th digit is questionable as accurate

 S=significant under 

Now look at the result of int(ts*1_000_000_000):

 1687976937597063936
 SSSSSSSSSSSSSSS?XXX

 S=significant
 ?=Questionably significant
 X=Likely inaccurate.

So the issue is really the inverse of your problem statement.

With a 64 bit float:

  1. ANY time stamp can be represented to millisecond accuracy;
  2. MOST time stamps can be represented to microsecond accuracy;
  3. Digit by digit accuracy is lost between microsecond and nanosecond time stamps.

A 64 bit float time stamp is really just a convenience data structure for a time stamp. The actual representation of nanosecond time stamps are OS dependent but usually is a compound data structure with two ints; one for the seconds and another for the fraction.

dawg
  • 98,345
  • 23
  • 131
  • 206
  • Fascinating to see it done like that. There's also a built-in library for doing exactly that: https://docs.python.org/3/library/decimal.html. If I go the "treat it as a string" route, I would definitely prefer the official library just to ensure there aren't bugs and gotchas (for example, you mentioned in your example that your formatting string is locale-dependent and would have issues). While it was a cool effort, I'd definitely pick the decimal library instead. :D – Mitch McMabers Jun 28 '23 at 21:41
  • @MitchMcMabers: Decimal is not needed here. Please see pure float version I just added. – dawg Jun 29 '23 at 10:37
  • Nice `math.modf` trick! Annoyingly it returns the "whole" portion as a float instead of int though so it will still have float rounding errors in the significant portion before the decimals. And `1eX` are floats (mult a float by a float, which is slower than mult by an int). I like it, but would do return `int(sec)*1_000_000_000+int(dec*1_000_000_000)`. And I see that you did `1e3` on the line above to move 3 decimals to the front and truncated the rest, hmm, very cool truncation! But precision of "ts as float" depends on the filesystem so we can't assume only 3 decimals matter. Cool though. – Mitch McMabers Jun 29 '23 at 11:16
  • 1
    @MitchMcMabers: *Annoyingly it returns the "whole" portion as a float instead of int though so it will still have float rounding errors in the significant portion before the decimals.* This is not actually true. A 64 bit float can represent integers **exactly** in the range of 2^53. See [here](https://en.wikipedia.org/w/index.php?title=Floating_point&oldid=376101741#IEEE_754:_floating_point_in_modern_computers). So the conversion from milliseconds to nanoseconds is only subject to error after the 16th decimal digit or if you assume the digits after 1e-3 are meaningful... – dawg Jun 29 '23 at 11:42
  • Yeah I'm aware of that, but `2^53` is only `9_007_199_254_740_992` while `1687976937 * 1e9` is `1_687_976_937_000_000_000`, so all of the accurate digits are used up before even reaching the decimals. That's why I decided to use integer math for the whole seconds, and only using (necessary evil) float math for the decimals. I wasn't speaking about the decimals. Edit: Oh I get what you mean. Yeah, the return value of `math.fmod()` for the "whole seconds" initially fits within the accurate digits of a float, and is therefore safe to convert to an int to then do the "multiply by 1 billion" step. – Mitch McMabers Jun 29 '23 at 15:12
  • Thanks for the updated version of your answer. I just saw it. Yeah, the "direct multiplication of the float" is the least accurate method. I mentioned that in my answer with examples, and mentioned the more accurate method too (doing the whole seconds as int mult, and doing nanoseconds as a standalone float mult). The thing is, beyond the first 1 or 2 decimals, nobody really cares about timestamp accuracy. 18.25 seconds is practically the same thing as 18.25239382 seconds. That's why I mention the simplified variant first. But the more accurate variant is fast too, as seen in my benchmarks. :) – Mitch McMabers Jun 29 '23 at 15:26
  • @MitchMcMabers: *The thing is, beyond the first 1 or 2 decimals, nobody really cares about timestamp accuracy.* And that my friends is how we get things like the Y2K and Y38 issues. Accuracy DOES matter especially when relatively trivial to achieve. With high transaction volume , with only 1 or 2 digits of accuracy (less than millisecond accuracy) many transactions will loose their logical creation, read, modify times. This might break things. – dawg Jun 29 '23 at 21:33
  • You're right. For applications where absolute sub-second robustness is required, people should ONLY be using the `stat().st_Xtime_ns` APIs to preserve the EXACT timestamps as perfect integers. But for applications where one or two sub-second digits are all that matters, it's fine to use any of these "float-based" methods. :) I process timestamps as floats because it's more convenient (long story). But I've decided to use two-step conversion for the decimals (the 2nd method in my answer), since it's fast enough, and it gets the rounding errors down into insignificance, so might as well. ;) – Mitch McMabers Jun 29 '23 at 21:50
0

Floats can have a really large exponent without losing their significant precision. Turns out that floats allow really large multiplication without any issues, as such:

>>> 9_000_000_000_000_000_000_000_000_000.0 * 1_000_000_000
9e+36
>>> 9_123_456_789_012_345_678_901_234_567.0 * 1_000_000_000
9.123456789012346e+36
>>> int(9_123_456_789_012_345_678_901_234_567.0 * 1_000_000_000)
9123456789012346228226434616254267392

So basically, the float is keeping as many "significant digits" as it can fit internally, truncating the rest (in the left hand operator in the examples above), and then just scaling the exponent. It's able to roughly represent Unix nanosecond timestamps that are far larger than the age of the universe.

When it's time to convert it to an integer, you can also see that the float keeps as much precision as it could and does a good job with the conversion. All of the significant digits are there. There's a lot of "random float rounding errors/noise" at the end of the output number, but those digits don't matter.

In other words, I've had a fundamental misunderstanding about the size of numbers that a float can store. It's not limited per se. It just stores a fixed amount of significant digits and then it uses an exponent to reach the desired scale. So a float would suffice here!

The answer is that I can just do the multiplication directly, and it will be totally safe. Since my multiplier is a straight 1 billion without any fractions, it will just scale up the exponent by 1 billion, without changing any of the digits at all. Fantastic. :)

Just like this!

>>> int(1687976937.597064 * 1_000_000_000)
1687976937597063936

Although when we use an integer like above, Python actually internally converts it into a float (1_000_000_000 (int) -> 1e9 (float)), since the other operand is a float.

So it's actually 6% faster to do that multiplication with a float directly (avoiding the need for int -> float conversion of the multiplier):

>>> int(1687976937.597064 * 1e9)
1687976937597063936

As you can see, the result is identical, since both cases are doing float * float math. The integer just required an extra conversion step first, which the latter method avoids.

Let's recap:

  • 1_687_976_937_597_064_018 was the result of my "split" algorithm earlier (in my original question).
  • 1_687_976_937_597_063_936 is the result given by the suggestion to "just trust the float and do the multiply directly".
  • 1_687_976_937_597_064_000 is the mathematically correct answer given by Wolfram Alpha's calculator.

So my "split" technique had a smaller rounding error. The reason why my method was more accurate is because I had "split" my number into "whole" (int) and "decimals/fractions" (float). Which means that my method has full devotion of all significant digits to the decimals, since I had removed "the whole number" before the decimals/fractions. This means that my "decimals" float was able to devote all significant digits to properly representing the decimals with much greater precision.

But these are UNIX timestamps represented as nanoseconds, and nobody really cares about the "fractions of a second" precision that much. What matters are the first few, important digits of the fraction, and those are all correct. That's all that matters in the end. I'll be using this result to set timestamps on disk via the utimensat API, and all that really matters is that I get roughly the correct fractions of a second. :)

I use the Python os.utime() wrapper for that API, which takes the nanoseconds as a signed integer: "If ns is specified, it must be a 2-tuple of the form (atime_ns, mtime_ns) where each member is an int expressing nanoseconds."

I'm going to do the straight multiplication and then convert the result to an int. That does the math in one simple step, gets sufficient precision for the decimals (fractions of a second), and solves the issue in a satisfactory way!

Here's the Python code I'll be using. It preserves the current "access time" as nanoseconds by fetching that value from disk, and takes the self.unix_mtime float (a UNIX timestamp with fractions of a second as decimals) and converts that to a signed 64-bit integer nanosecond representation, and then applies the change to the target file/directory:

# Good enough precision for practically anybody. Fast.
file_meta = target_path.lstat()
st_mtime_ns = int(self.unix_mtime * 1e9)
os.utime(
    target_path, ns=(file_meta.st_atime_ns, st_mtime_ns), follow_symlinks=False
)

If anyone else wants to do this, beware that I am using lstat() to get the status of symlinks rather than their target, and using follow_symlinks=False to ensure that if the final target_path component is a symlink then I affect the link itself rather than the target. Other people may want to change these calls to stat() and follow_symlinks=True if you prefer affecting the target rather than the symlink itself. But I would guess that most people prefer my method of affecting the symlink itself if the target_path points at a symlink.

If you care about doing this "seconds-float to nanoseconds int" conversion with the highest achievable precision (by devoting maximum float precision to all the decimal digits to minimize rounding errors), then you can do my "split" variant as follows instead (I added type hints for clarity):

# Great conversion precision. Slower.
file_meta = target_path.lstat()
whole: int = int(self.unix_mtime)
frac: float = self.unix_mtime - whole
st_mtime_ns: int = whole * 1_000_000_000 + int(frac * 1e9)
os.utime(
    target_path, ns=(file_meta.st_atime_ns, st_mtime_ns), follow_symlinks=False
)

As you can see, it uses int * int math for the "whole seconds" and uses float * float math for the "fractions of a second". And then combines the result into an integer. This gives the best of both worlds in terms of accuracy and speed.

I did some benchmarks:

  • 50 million iterations on a Ryzen 3900x CPU.
  • The "simplified, less accurate" version took 11.728529000014532 seconds.
  • The more accurate version took 26.941824199981056 seconds. That's 2.3x the time.
  • Considering that I did 50 million iterations, you can be sure that you can safely use the more accurate version without having to worry about the performance. So if you want more accurate timestamps, feel free to use the last method. :)
  • As a bonus, I benchmarked @dawg's answer, which is the exact same idea as "the more accurate method", but is done via two calls to math.modf() instead of directly calculating the whole/fraction manually. Their answer is the slowest at 33.54755139999557 seconds. I wouldn't recommend it. Besides, the primary idea behind their technique was just to discard everything after the first three float decimals, which doesn't even matter for any practical purposes, and if their removal is truly desired then it can be achieved without slow math.modf() calls by simply changing my "more accurate" variant's final line to say whole * 1_000_000_000 + (int(frac * 1e3) * 1_000_000) instead, which achieves that decimal truncation technique in 27.95227960000746 seconds instead.

There's also a third method via the discussed decimal library which would have perfect mathematical precision (it doesn't use floats), but it's very slow, so I didn't include it.

Mitch McMabers
  • 3,634
  • 28
  • 27
  • *why not multiply by a float 1e9 instead? Well, multiplying a float by an int (as above) is a lot faster (on CPUs) than multiplying a float by a float. So it's simply "the most correct way of doing this", even though both methods work in practice.* This really is a form of [micro-optimization](https://softwareengineering.stackexchange.com/questions/99445/is-micro-optimisation-important-when-coding) that may or may not be true. Likely there are larger areas of your code that would benefit from attention more than `float*int` vs `float*float`. – dawg Jun 29 '23 at 12:04
  • 1
    Also, it may actually be *slower* to do `float*int_literal` since the int_literal usually is converted to a float anyway (unless a compiler notices a bit shift that can acomplish the multiplication.) See [here](https://stackoverflow.com/a/65387899/298607) – dawg Jun 29 '23 at 12:09
  • 1
    @dawg You're right. I picked up that "float multiplication and division is terribly slow" trivia years ago, but it turns out that Python doesn't work that way. Python converts the `1_000_000_000` int to a float. So it's the same as writing `1e9` but with an extra conversion step involved before the multiplication can happen. I benchmarked it and it was about 6% slower to use an int. Although both methods were super fast. Meh. Thanks for that discovery. I'm editing the answer to use an `1e9` float instead. :) I am grateful to have learned something new about float math today. – Mitch McMabers Jun 29 '23 at 14:42