Float values as dictionary key

Question

I am developing a class for the analysis of microtiter plates. The samples are described in a separate file and the entries are used for an ordered dictionary. One of the keys is pH, which is usually given as float. e.g 6.8

I could import it as decimal with Decimal('6.8') in order to avoid a float as dict key. Another solution would be to replace the dot with e.g p like 6p8 or to write 6p8 in my sample description and therefore eliminating the problem at the beginning. But this would cause troubles later on since i cannot plot pH of 6p8 in my figures.

How would you solve this issue?

Perhaps you want to truncate your float in order to get a handle on its quality prior to using is as key? — Reblochon Masque, May 18 '14 at 13:01

score 32 · Accepted Answer · answered May 18 '14 at 11:12

32

There's no problem using floats as dict keys.

Just round(n, 1) them to normalise them to your keyspace. eg.

>>> hash(round(6.84, 1))
3543446220
>>> hash(round(6.75, 1))
3543446220

answered May 18 '14 at 11:12

John La Rooy

295,403
53
369
502

Also a nice solution. And also easy. – Moritz May 18 '14 at 11:20
1

What is the difference between your solution and Reblochon`s ? He suggests that i should use just round() without additional hash(). I do not need very high precision since my pH conditions always differ by at least 0.1. – Moritz May 18 '14 at 11:38
6

I didn't mean to suggest you use `hash` explicitly. The `hash` value is how a dict decides that keys _might_ be equal (equal values _must_ have the same hash). This is how the O(1) performance is obtained. I was merely demonstrating that the hash values are the same ( this is obvious when you understand that the result of round is identical) – John La Rooy May 18 '14 at 11:41
1

@gnibbler, why not just use the floats if all the floats will have different values, why is there a need to round? – Padraic Cunningham May 18 '14 at 11:48
4

@PadraicCunningham, If you are confident the floats are all coming from the same source and being produced the same way, you can get away with this. But I wouldn't place that much trust in data that's not from a source I control. – John La Rooy May 18 '14 at 11:51
Concerned there may be a problem with **hash collisions**. I [tried one](http://stackoverflow.com/a/33459086/673991), but for some reason the dictionary worked anyway. – Bob Stein Nov 01 '15 at 04:04
1

@BobStein-VisiBone, Dictionarys would be very broken if they relied solely on the uniqueness of hashes. – John La Rooy Nov 01 '15 at 10:41
@JohnLaRooy I have a table, with multiple columns, where the first column consists of monotonically increasing floats. I tried using the first column as float keys for a dictionary. I was surprised to find, that when printing `a.keys()` the values are no longer ordered and to ensure a proper plot I had to use `sorted(a.keys())`. Is this a normal behaviour? – Alexander Cska Apr 16 '16 at 21:20
This is a bad solution: `round(6.500001) != round(6.4999999)`. – Yichao Zhou Jun 13 '16 at 20:47
@AlexanderCska, yes that is normal. There is an `OrderedDict` in the `collections` module that may work for you. – John La Rooy Jun 13 '16 at 23:37
@YichaoZhou, why is that bad? This is commonly called "round half away from zero". If you need another type of rounding you should convert to `Decimal` objects – John La Rooy Jun 13 '16 at 23:44
4

@BobStein-VisiBone, to clarify - hashes must be equal for equal values, but the converse is not true. This is another why of saying that hashes need not be unique. You can think of looking up by hash as a filter so you don't need to test against every key. After the hash filter the key is also tested for equality to weed out false positives. – John La Rooy Jun 13 '16 at 23:50
@JohnLaRooy None of the rounding methods will guarantee the correct result on dictionary: a small change of the floating number may result to different result. I think the only solution is to use some data structure to find the closest floating point number, but that may be too complex in this case. – Yichao Zhou Jun 15 '16 at 00:24
@YichaoZhou any method which finds the closest floating point number could find a different number if if there were a small change in the input, couldn't it? – heltonbiker Aug 31 '17 at 11:29
Rounding can cause troubles (it does not always "truncate"). See this answer to a related question: https://stackoverflow.com/a/783914/1030104 – igr Jan 25 '18 at 09:12

Reblochon Masque · Answer 2 · 2018-08-13T15:57:53.987

5

Perhaps you want to truncate your float prior to using is as key?

Maybe like this:

a = 0.122334
round(a, 4)       #<-- use this as your key?

Your key is now:

0.1223           # still a float, but you have control over its quality

You can use it as follows:

dictionary[round(a, 4)]

to retrieve your values

edited Aug 13 '18 at 15:57

answered May 18 '14 at 11:11

Reblochon Masque

35,405
10
55
80

1

Why do you think this is safer than just using the rounded floats? – John La Rooy May 18 '14 at 11:14
floats are immutable too. In this case your fear is misplaced and gives you a performance penalty. – John La Rooy May 18 '14 at 11:16
Once they are rounded, the binary representation is identical, thus the hashes are too of course. – John La Rooy May 18 '14 at 11:19
Nice solution. And easy. – Moritz May 18 '14 at 11:20
Also the OP wants to plot the data so he would have to cast back to float adding more unnecessary overhead. – Padraic Cunningham May 18 '14 at 12:06

score 1 · Answer 3 · answered Jun 06 '20 at 10:56

Another way would be enter the keys as strings with the point rather than a p and then recast them as floats for plotting.

Personally, if you don't insist on the dict format, I would store the data as a pandas dataframe with the pH as a column as these are easier to pass to plotting libraries

score 0 · Answer 4 · answered May 10 '20 at 02:26

0

Another quick option is to use strings of the float

a = 200.01234567890123456789
b = {str(a): 1}
for key in b:
    print(float(key), b[key])

would print out

(200.012345679, 1)

notice a gets truncated at tenth digit after decimal point.

answered May 10 '20 at 02:26

zyy

1,271
15
25

David Foerster · Answer 5 · 2021-06-16T14:22:31.200

If you want to use the float key dictionary at multiple places in your program, it might me worth to "hide" the complexity of how it ought to be used (i. e. with rounded keys) in a new dictionary class (full implementation).

Example:

>>> d = FloatKeyDictionary(2, {math.pi: "foo"})
>>> d
{3.14: 'foo'}

>>> d[3.1415]
'foo'

>>> 3.1 in d
False

>>> d[math.e] = "My hovercraft is full of eels!"
>>> d
{3.14: 'foo', 2.72: 'My hovercraft is full of eels!'}

You can find an abridged version with the general idea below:

import abc
import collections.abc


class KeyTransformDictionaryBase(dict, abc.ABC):

    @abc.abstractmethod
    def __key_transform__(self, key):
        raise NotImplementedError

    def __contains__(self, key):
        return super().__contains__(self.__key_transform__(key))

    def __getitem__(self, key):
        return super().__getitem__(self.__key_transform__(key))

    def __setitem__(self, key, value):
        return super().__setitem__(self.__key_transform__(key), value)

    def __delitem__(self, key):
        return super().__delitem__(self.__key_transform__(key))


class FloatKeyDictionary(KeyTransformDictionaryBase):

    def __init__(self, rounding_ndigits, data=None):
        super().__init__()
        self.rounding_ndigits = rounding_ndigits
        if data is not None:
            self.update(data)

    def __key_transform__(self, key):
        return round(key, self.rounding_ndigits)

Float values as dictionary key

5 Answers5

Linked