1

I have a CSV file in which one of the columns contains a range of real numbers with a 0.1 step interval. However, some of the values are missing. I've loaded the data into a pandas DataFrame and set this column as the index. I am having difficulty identifying the missing index values due to the way floating point numbers are represented. I would like to find the missing rows and fill them with np.nan.

For example, given

        A     B
0.0  78.4  65.9
0.1  72.9  29.2
0.3  80.7  45.9
0.4  10.9  82.3
0.5  37.7  85.4
0.6  90.4  69.9
0.7  17.0  57.2
0.8  70.0  49.6
1.0  58.6  65.1

I would like to get

        A     B
0.0  78.4  65.9
0.1  72.9  29.2
0.2   NaN   NaN
0.3  80.7  45.9
0.4  10.9  82.3
0.5  37.7  85.4
0.6  90.4  69.9
0.7  17.0  57.2
0.8  70.0  49.6
0.9   NaN   NaN
1.0  58.6  65.1

I have tried using pandas.DataFrame.reindex, but it's not working due to floating point representation. The following following is what I've tried so far (my data has a much larger range):

import sys

import numpy as np
import pandas as pd

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

data = """, A, B
0.0, 78.4,  65.9
0.1,  72.9,  29.2
0.3,  80.7,  45.9
0.4,  10.9,  82.3
0.5,  37.7,  85.4
0.6,  90.4,  69.9
0.7,  17.0,  57.2
0.8,  70.0,  49.6
1.0,  58.6,  65.1"""

fp = StringIO(data)
df = pd.read_csv(fp, index_col=0)
print(
    df.reindex(
        pd.Index(
            np.arange(df.index[0], df.index[-1]+0.1, 0.1)
        )
    )
)

This outputs

        A     B
0.0  78.4  65.9
0.1  72.9  29.2
0.2   NaN   NaN
0.3   NaN   NaN
0.4  10.9  82.3
0.5  37.7  85.4
0.6   NaN   NaN
0.7   NaN   NaN
0.8  70.0  49.6
0.9   NaN   NaN
1.0  58.6  65.1
Jeff
  • 53
  • 3

1 Answers1

6

Due to the float can not always get the 100% match link, we can pass the tolerance with reindex

out = df.reindex(np.arange(df.index.min(),df.index.max()+0.1,0.1),method='nearest',tolerance=0.01)
        A     B
0.0  78.4  65.9
0.1  72.9  29.2
0.2   NaN   NaN
0.3  80.7  45.9
0.4  10.9  82.3
0.5  37.7  85.4
0.6  90.4  69.9
0.7  17.0  57.2
0.8  70.0  49.6
0.9   NaN   NaN
1.0  58.6  65.1
BENY
  • 317,841
  • 20
  • 164
  • 234