Pandas wrong round decimation

Question

I am calculating the duration of the data acquisition from some sensors. Although the data is collected faster, I would like to sample it at 10Hz. Anyways, I created a dataframe with a column called 'Time_diff' which I expect it goes [0.0, 0.1, 0.2, 0.3 ...]. However it goes somehow like [0.0, 0.1, 0.2, 0.30000004 ...]. I am rounding the data frame but still, I have this weird decimation. Is there any suggestions on how to fix it?

The code:

for i in range(self.n_of_trials):
            start = np.zeros(0)
            stop = np.zeros(0)
            for df in self.trials[i].df_list:
                start = np.append(stop, df['Time'].iloc[0])
                stop = np.append(start, df['Time'].iloc[-1])
            t_start = start.min()
            t_stop = stop.max()
            self.trials[i].duration = t_stop-t_start
            t = np.arange(0, self.trials[i].duration+self.trials[i].dt, self.trials[i].dt)
            self.trials[i].df_merged['Time_diff'] = t
            self.trials[i].df_merged.round(1)

when I print the data it looks like this:

0       0.0
1       0.1
2       0.2
3       0.3
4       0.4
       ... 
732    73.2
733    73.3
734    73.4
735    73.5
736    73.6
Name: Time_diff, Length: 737, dtype: float64

However when I open as csv file it is like that:

Addition

I think the problem is not csv conversion but how the float data converted/rounded. Here is the next part of the code where I merge more dataframes on 10Hz time stamps:

    for j in range(len(self.trials[i].df_list)):
        df = self.trials[i].df_list[j]
        df.insert(0, 'Time_diff', round(df['Time']-t_start, 1))
        df.round({'Time_diff': 1})
        df.drop_duplicates(subset=['Time_diff'], keep='first', inplace=True)
        self.trials[i].df_merged = pd.merge(self.trials[i].df_merged, df, how="outer", on="Time_diff", suffixes=(None, '_'+self.trials[i].df_list_names[j]))
#Test csv
self.trials[2].df_merged.to_csv(path_or_buf='merged.csv')

And since the inserted dataframes have exact correct decimation, it is not merged properly and create another instance with a new index.

Does this answer your question? [pandas.DataFrame.round doesn't seem to work on my DataFrames - Rounding issue causes extra data stored in csv files](https://stackoverflow.com/questions/47066858/pandas-dataframe-round-doesnt-seem-to-work-on-my-dataframes-rounding-issue-ca) — LazyClown, Oct 26 '22 at 10:06
I am afraid it is not the solution I need because I think the problem is not about csv conversion necessarily. It is about the float being created/rounded. I merge this time_stamp dataframe with another dataframe but it is not merged because the column elements are not exact. — TheClem, Oct 26 '22 at 12:25
I also tried pd.merge_asof but it does not support *how* parameter so I would like to create proper timestamps and use pd.merge instead of pd.merge_asof — TheClem, Oct 26 '22 at 12:27

marco romelli · Answer 1 · 2022-10-27T11:44:06.573

1

This is not a rounding problem, it is a behavior intrinsic in how floating point numbers work. Actually 0.30000000000000004 is the result of 0.1+0.1+0.1 (try it out yourself in a Python prompt).

In practice not every decimal number is exactly representable as a floating point number so what you get is instead the closest possible value.

You have some options depending if you just want to improve the visualization or if you need to work on exact values. If for example you want to use that column for a merge you can use an approximate comparison instead of an exact one.

Another option is to use the decimal module: https://docs.python.org/3/library/decimal.html which works with exact arithmetic but can be slower.

In your case you said the column should represent frequency at steps of 10Hz so I think changing the representation so that you directly use 10, 20, 30, ... will allow you to use integers instead of floats.

If you want to see the "true" value of a floating point number in python you can use format(0.1*6, '.30f') and it will print the number with 30 digits (still an approximation but much better than the default).

edited Oct 27 '22 at 11:44

answered Oct 26 '22 at 13:40

marco romelli

1,143
8
19

I am not sure if I understand it correctly. If `0.30000000000000004` is how 0.3 is stored/represented as float, why my second instance is an exact(?) 0.3? As I mentioned in the previous comment, I don't want to use the approximate merge. I can change everything to integers by simply multiplying by 10 but I believe that there is a way of solving it without changing the representation. – TheClem Oct 27 '22 at 11:25
Many times what you see is not what actually is behind the scenes. Programming languages have different ways of showing float numbers. I don't know why you see 0.3 once but it depends where the number comes from. Actually 0.30000000000000004 is not the underlying representation of 0.3. It is just the result of 0.1+0.1+0.1. The underlying representation of 0.3 would be 0.299999999999999988897769753748434595763683319091796875 (check out https://www.exploringbinary.com/floating-point-converter/) – marco romelli Oct 27 '22 at 11:35
Additionally I don't think it makes sense to merge on a float column if you don't allow at least a minimum amount of tolerance. There would be unexpected results almost for sure. – marco romelli Oct 27 '22 at 11:41
I understood what you mean :) I just don't understand why two 0.3s are different in this particular example. The reason why I didn't want to use approximate merge is because `pd.merge` does not have *tolerance* parameter and `pd.merge_asof` does not have *how* parameter. I didn't want to sacrifice the *how* parameter. – TheClem Oct 27 '22 at 11:46
So if I understand why the difference in these 0.3s let's say created/stored differently, I can create both in the same way instead. My doubt is the `t = np.arange(0, self.trials[i].duration+self.trials[i].dt, self.trials[i].dt)` line can be changed, for instance. I think `df.insert(0, 'Time_diff', round(df['Time']-t_start, 1))` gives the decimation I want when I after-check the data. – TheClem Oct 27 '22 at 11:49
1

They are different because they probably come from 2 different operations. Check out the difference between `format(0.1*3, '.30f')` and `format(0.3, '.30f')`. Here I think is a good way of merging on float columns: https://stackoverflow.com/a/41136518/9055614. Rounding can't work; maybe you see what you expect but behind there is still a different number. – marco romelli Oct 27 '22 at 11:53

Pandas wrong round decimation

1 Answers1