31

I am developing a class for the analysis of microtiter plates. The samples are described in a separate file and the entries are used for an ordered dictionary. One of the keys is pH, which is usually given as float. e.g 6.8

I could import it as decimal with Decimal('6.8') in order to avoid a float as dict key. Another solution would be to replace the dot with e.g p like 6p8 or to write 6p8 in my sample description and therefore eliminating the problem at the beginning. But this would cause troubles later on since i cannot plot pH of 6p8 in my figures.

How would you solve this issue?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Moritz
  • 5,130
  • 10
  • 40
  • 81

5 Answers5

32

There's no problem using floats as dict keys.

Just round(n, 1) them to normalise them to your keyspace. eg.

>>> hash(round(6.84, 1))
3543446220
>>> hash(round(6.75, 1))
3543446220
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • Also a nice solution. And also easy. – Moritz May 18 '14 at 11:20
  • 1
    What is the difference between your solution and Reblochon`s ? He suggests that i should use just round() without additional hash(). I do not need very high precision since my pH conditions always differ by at least 0.1. – Moritz May 18 '14 at 11:38
  • 6
    I didn't mean to suggest you use `hash` explicitly. The `hash` value is how a dict decides that keys _might_ be equal (equal values _must_ have the same hash). This is how the O(1) performance is obtained. I was merely demonstrating that the hash values are the same ( this is obvious when you understand that the result of round is identical) – John La Rooy May 18 '14 at 11:41
  • 1
    @gnibbler, why not just use the floats if all the floats will have different values, why is there a need to round? – Padraic Cunningham May 18 '14 at 11:48
  • 4
    @PadraicCunningham, If you are confident the floats are all coming from the same source and being produced the same way, you can get away with this. But I wouldn't place that much trust in data that's not from a source I control. – John La Rooy May 18 '14 at 11:51
  • Concerned there may be a problem with **hash collisions**. I [tried one](http://stackoverflow.com/a/33459086/673991), but for some reason the dictionary worked anyway. – Bob Stein Nov 01 '15 at 04:04
  • 1
    @BobStein-VisiBone, Dictionarys would be very broken if they relied solely on the uniqueness of hashes. – John La Rooy Nov 01 '15 at 10:41
  • @JohnLaRooy I have a table, with multiple columns, where the first column consists of monotonically increasing floats. I tried using the first column as float keys for a dictionary. I was surprised to find, that when printing `a.keys()` the values are no longer ordered and to ensure a proper plot I had to use `sorted(a.keys())`. Is this a normal behaviour? – Alexander Cska Apr 16 '16 at 21:20
  • This is a bad solution: `round(6.500001) != round(6.4999999)`. – Yichao Zhou Jun 13 '16 at 20:47
  • @AlexanderCska, yes that is normal. There is an `OrderedDict` in the `collections` module that may work for you. – John La Rooy Jun 13 '16 at 23:37
  • @YichaoZhou, why is that bad? This is commonly called "round half away from zero". If you need another type of rounding you should convert to `Decimal` objects – John La Rooy Jun 13 '16 at 23:44
  • 4
    @BobStein-VisiBone, to clarify - hashes must be equal for equal values, but the converse is not true. This is another why of saying that hashes need not be unique. You can think of looking up by hash as a filter so you don't need to test against every key. After the hash filter the key is also tested for equality to weed out false positives. – John La Rooy Jun 13 '16 at 23:50
  • @JohnLaRooy None of the rounding methods will guarantee the correct result on dictionary: a small change of the floating number may result to different result. I think the only solution is to use some data structure to find the closest floating point number, but that may be too complex in this case. – Yichao Zhou Jun 15 '16 at 00:24
  • @YichaoZhou any method which finds the closest floating point number could find a different number if if there were a small change in the input, couldn't it? – heltonbiker Aug 31 '17 at 11:29
  • Rounding can cause troubles (it does not always "truncate"). See this answer to a related question: https://stackoverflow.com/a/783914/1030104 – igr Jan 25 '18 at 09:12
5

Perhaps you want to truncate your float prior to using is as key?

Maybe like this:

a = 0.122334
round(a, 4)       #<-- use this as your key?

Your key is now:

0.1223           # still a float, but you have control over its quality

You can use it as follows:

dictionary[round(a, 4)]   

to retrieve your values

Reblochon Masque
  • 35,405
  • 10
  • 55
  • 80
1

Another way would be enter the keys as strings with the point rather than a p and then recast them as floats for plotting.

Personally, if you don't insist on the dict format, I would store the data as a pandas dataframe with the pH as a column as these are easier to pass to plotting libraries

Plamen
  • 650
  • 1
  • 8
  • 27
0

Another quick option is to use strings of the float

a = 200.01234567890123456789
b = {str(a): 1}
for key in b:
    print(float(key), b[key])

would print out

(200.012345679, 1)

notice a gets truncated at tenth digit after decimal point.

zyy
  • 1,271
  • 15
  • 25
0

If you want to use the float key dictionary at multiple places in your program, it might me worth to "hide" the complexity of how it ought to be used (i. e. with rounded keys) in a new dictionary class (full implementation).

Example:

>>> d = FloatKeyDictionary(2, {math.pi: "foo"})
>>> d
{3.14: 'foo'}

>>> d[3.1415]
'foo'

>>> 3.1 in d
False

>>> d[math.e] = "My hovercraft is full of eels!"
>>> d
{3.14: 'foo', 2.72: 'My hovercraft is full of eels!'}

You can find an abridged version with the general idea below:

import abc
import collections.abc


class KeyTransformDictionaryBase(dict, abc.ABC):

    @abc.abstractmethod
    def __key_transform__(self, key):
        raise NotImplementedError

    def __contains__(self, key):
        return super().__contains__(self.__key_transform__(key))

    def __getitem__(self, key):
        return super().__getitem__(self.__key_transform__(key))

    def __setitem__(self, key, value):
        return super().__setitem__(self.__key_transform__(key), value)

    def __delitem__(self, key):
        return super().__delitem__(self.__key_transform__(key))


class FloatKeyDictionary(KeyTransformDictionaryBase):

    def __init__(self, rounding_ndigits, data=None):
        super().__init__()
        self.rounding_ndigits = rounding_ndigits
        if data is not None:
            self.update(data)

    def __key_transform__(self, key):
        return round(key, self.rounding_ndigits)
David Foerster
  • 1,461
  • 1
  • 14
  • 23