CSV file handling with Pandas for text comparison

Question

I have this csv file called input.csv

KEY;Rate;BYld;DataAsOfDate
CH04;0.719;0.674;2020-01-29
CH03;1.5;0.148;2020-01-29

then I execute the following code:

import pandas as pd
input_df = pd.read_csv('input.csv', sep=";")
input_df.to_csv('output.csv', sep=";")

and get the following output.csv file

KEY;Rate;BYld;DataAsOfDate
CH04;0.7190000000000001;0.674;2020-01-29
CH03;1.5;0.14800000000000002;2020-01-29

I was hoping for and expecting an output like this: (to be able to use a tool like winmerge.org to detect real differences on each row) (my real code truly modifies the dataframe - this stack overflow example is for demonstration only)

KEY;Rate;BYld;DataAsOfDate
CH04;0.719;0.674;2020-01-29
CH03;1.5;0.148;2020-01-29

What is the idiomatic way with to achieve such an unmodified output with Pandas?

I guess that depends on what you're going to do with your csv apart from reading/writing. How about reading it into pandas treating each columns as string instead of float/int...? — Proko, Mar 01 '22 at 14:28

score 1 · Answer 1 · answered Mar 01 '22 at 14:30

1

Python does not use traditional rounding to so as to prevent problems with bankers rounding. However, if being close is not a problem you could use the round function and replace the "2" with whichever number you would like to round to

d = [['CH04',0.719,0.674,'2020-01-29']]
df = pd.DataFrame(d, columns = (['KEY', 'Rate', 'BYld', 'DataAsOfDate']))
df['Rate'] = df['Rate'].apply(lambda x : round(x, 2))
df

answered Mar 01 '22 at 14:30

ArchAngelPwn

2,891
1
4
17

Thank you for having a look and trying to help. The issue I have with this approach is that I don't know the decimal specificity of each column (my real example has 38 columns and I prefer to not go through each and figure out the specifity) - I hope that makes sense! – Henrik K Mar 01 '22 at 14:34
1

For that you have a couple options, if you have 38 columns and one time would look to see that 12 are 1 decimal and 26 are 2 decimals (just for example) you could change the code slightly to resolve multiple at a single time. df[['Rate', 'BYld']] = df[['Rate', 'BYld']].apply(lambda x : round(x, 2)) this would resolve both columns at once so if you studied the data you might be able to still use the round option. – ArchAngelPwn Mar 01 '22 at 15:16

score 0 · Answer 2 · answered Mar 01 '22 at 14:50

0

Using @Prokos idea I changed the code like this:

import pandas as pd
input_df = pd.read_csv('input.csv', dtype='str',sep=";")
input_df.to_csv('str_output.csv', sep=";", index=False)

and that meets the requirement - all columns come out unchanged.

answered Mar 01 '22 at 14:50

Henrik K

1,051
1
15
34

Another thing that I had to do in a similar vein, to keep N/A values from changing between in and out - [33952142](https://stackoverflow.com/questions/33952142/prevent-pandas-from-interpreting-na-as-nan-in-a-string) – Henrik K Mar 01 '22 at 15:13

CSV file handling with Pandas for text comparison

2 Answers2