How do you find missing rows in a pandas index containing a range of floats?

Question

I have a CSV file in which one of the columns contains a range of real numbers with a 0.1 step interval. However, some of the values are missing. I've loaded the data into a pandas DataFrame and set this column as the index. I am having difficulty identifying the missing index values due to the way floating point numbers are represented. I would like to find the missing rows and fill them with np.nan.

For example, given

        A     B
0.0  78.4  65.9
0.1  72.9  29.2
0.3  80.7  45.9
0.4  10.9  82.3
0.5  37.7  85.4
0.6  90.4  69.9
0.7  17.0  57.2
0.8  70.0  49.6
1.0  58.6  65.1

I would like to get

        A     B
0.0  78.4  65.9
0.1  72.9  29.2
0.2   NaN   NaN
0.3  80.7  45.9
0.4  10.9  82.3
0.5  37.7  85.4
0.6  90.4  69.9
0.7  17.0  57.2
0.8  70.0  49.6
0.9   NaN   NaN
1.0  58.6  65.1

I have tried using pandas.DataFrame.reindex, but it's not working due to floating point representation. The following following is what I've tried so far (my data has a much larger range):

import sys

import numpy as np
import pandas as pd

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

data = """, A, B
0.0, 78.4,  65.9
0.1,  72.9,  29.2
0.3,  80.7,  45.9
0.4,  10.9,  82.3
0.5,  37.7,  85.4
0.6,  90.4,  69.9
0.7,  17.0,  57.2
0.8,  70.0,  49.6
1.0,  58.6,  65.1"""

fp = StringIO(data)
df = pd.read_csv(fp, index_col=0)
print(
    df.reindex(
        pd.Index(
            np.arange(df.index[0], df.index[-1]+0.1, 0.1)
        )
    )
)

This outputs

        A     B
0.0  78.4  65.9
0.1  72.9  29.2
0.2   NaN   NaN
0.3   NaN   NaN
0.4  10.9  82.3
0.5  37.7  85.4
0.6   NaN   NaN
0.7   NaN   NaN
0.8  70.0  49.6
0.9   NaN   NaN
1.0  58.6  65.1

score 6 · Accepted Answer · answered Apr 28 '21 at 00:27

Due to the float can not always get the 100% match link, we can pass the tolerance with reindex

out = df.reindex(np.arange(df.index.min(),df.index.max()+0.1,0.1),method='nearest',tolerance=0.01)
        A     B
0.0  78.4  65.9
0.1  72.9  29.2
0.2   NaN   NaN
0.3  80.7  45.9
0.4  10.9  82.3
0.5  37.7  85.4
0.6  90.4  69.9
0.7  17.0  57.2
0.8  70.0  49.6
0.9   NaN   NaN
1.0  58.6  65.1

How do you find missing rows in a pandas index containing a range of floats?

1 Answers1