1

I tried to create some numbers and rename the output columns with the np.arange loop as the following:

def conditional_zero_column(filename="random.csv"):
    df = pd.read_csv(filename)
    for i in np.arange(0.6,1.0,0.01):
        df['reject'+str(i)] = np.where(df['expected_discount'] < i, df['expected_discount'], i)
        df['reject'+str(i)] = np.where(df['reject'+str(i)] >= i, df['reject'+str(i)], 0.0)
        df.to_csv("random_data4.csv", index=False)

The conditional number part worked for me.

The columns' names were fine between columns reject0.6-reject0.68. After that, all columns' names turned to reject with the unexpected numbers, e.g., reject0.6900000000000001, reject0.8000000000000002, reject0.9900000000000003for all other columns as the attached the picture shows.

[column names] 1

I am curious why the numbers are different after 0.69. I tried to simply replace np.arange with np.linspace, but it doesn't work for me. Am I wrong with any part?

I appreciate any help you can provide.

MattD626
  • 125
  • 5
  • Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – mkrieger1 Jun 16 '22 at 07:21

1 Answers1

1

You are seeing the tail of the floating point precision. It is impossible to exactly represent most floats, and we end up with a tail that end at the numeric precision.

I think you can solve this by formatting the strings you are using for column names.

def conditional_zero_column(filename="random.csv"):
    df = pd.read_csv(filename)
    for i in np.arange(0.6, 1.0, 0.01):
        col = f'reject{i:.2f}'
        df[col] = np.where(df['expected_discount'] < i, df['expected_discount'], i)
        df[col] = np.where(df[col] >= i, df[col], 0.0)
        df.to_csv("random_data4.csv", index=False)
James
  • 32,991
  • 4
  • 47
  • 70