9
s.index=[0.0,1.1,2.2,3.3,4.4,5.5]
s.index
# Float64Index([0.0, 1.1, 2.2, 3.3, 4.4, 5.5], dtype='float64')
s
# 0.0    141.125
# 1.1    142.250
# 2.2    143.375
# 3.3    143.375
# 4.4    144.500
# 5.5    145.125
s.index=s.index.astype('float32')
# s.index
# Float64Index([              0.0, 1.100000023841858, 2.200000047683716,
#               3.299999952316284, 4.400000095367432,               5.5],
#              dtype='float64')

What's the intuition behind floating point indices? Struggling to understand when we would use them instead of int indices (it seems like you can have three types of indices: int64, float64, or object, e.g. s.index=['a','b','c','d','e','f']).

From the code above, it also looks like Pandas really wants float indices to be in 64-bit, as these 64-bit floats are getting cast to 32-bit floats and then back to 64-bit floats, with the dtype of the index remaining 'float64'.

How do people use float indicies?

Is the idea that you might have some statistical calculation over data and want to rank on the result of it, but those results may be floats? And we want to force float64 to avoid losing resolution?

cs95
  • 379,657
  • 97
  • 704
  • 746
phoenixdown
  • 828
  • 1
  • 10
  • 16
  • 1
    haven't used float indices before; however, it could come in handy for binning, especially if you are dealing with floats – sammywemmy Jun 14 '20 at 06:54

1 Answers1

8

Float indices are generally useless for label-based indexing, because of general floating point restrictions. Of course, pd.Float64Index is there in the API for completeness but that doesn't always mean you should use it. Jeff (core library contributor) has this to say on github:

[...] It is rarely necessary to actually use a float index; you are often better off served by using a column. The point of the index is to make individual elements faster, e.g. df[1.0], but this is quite tricky; this is the reason for having an issue about this.

The tricky part there being 1.0 == 1.0 isn't always true, depending on how you represent that 1.0 in bits.

Floating indices are useful in a few situations (as cited in the github issue), mainly for recording temporal axis (time), or extremely minute/accurate measurements in, for example, astronomical data. For most other cases there's pd.cut or pd.qcut for binning your data because working with categorical data is usually easier than continuous data.

cs95
  • 379,657
  • 97
  • 704
  • 746