I have written a Python script which loads one or several csv files, concatenates them and writes the whole into a single new csv file. I have noticed that certain values are modified during this operation, being slightly incremented/decremented by very small values. As an example:
Original CSV:
Index SomeValue
0.000000 0.000
1.000000 0.000
2.000000 0.000
3.000000 0.000
4.000000 2.527
5.000000 0.000
Saved CSV:
Index SomeValue
0.0 0.0
1.0 0.0
2.0 0.0
3.0 0.0
4.0 2.5269999999999997
5.0 0.0
This looks like a full-scale error to me, but I don't know what causes it. The pandas core of my script, which is called in a loop, is:
l_tmpCsv_st = pd.read_csv(l_listElement_tc, sep='\t', index_col=0)
l_listOfCsvFiles_tst.append(l_tmpCsv_st)
# Fills in nan cells with the value "missing" to distinguish betweens a true nan and a missing value due to lacking padding
l_listOfCsvFiles_tst[-1] = l_listOfCsvFiles_tst[-1].fillna(value='missing')
# Concatenating csv file with previous ones
csvFusion = pd.concat([csvFusion, l_listOfCsvFiles_tst[-1]], axis=1)
And after the loop:
# Padding missing values of lower frequency files
csvFusion = csvFusion.fillna(method='pad')
# Determinating which columns need to be deleted (all "Unnamed" columns are panda-error results and need to be removed)
l_listColumnsToDelete_tst = [col for col in csvFusion.columns if 'Unnamed' in col]
# Dropping these columns
csvFusion.drop(l_listColumnsToDelete_tst, axis=1, inplace=True)
# Writing full stuff to file
csvFusion.to_csv(l_endFile_tc, sep='\t', decimal=',', na_rep='-')
The rest of my script is unrelated to pandas and would only harm readability, thus I have removed it from my copy/paste.
How could I avoid this issue?
Thanks in advance,
Edition:
It was indeed a floating point error. Rounding every value to a sufficient high digit solved it:
for col in csvFusion.columns:
csvFusion[col] = csvFusion[col].round(15)