NB: My question is not a duplicate of Format floats with standard json module. In fact, Mark Dickinson provided a good answer to my question in one of his comments, and this answer is all about pandas.read_csv
, which is not even mentioned in that earlier post. Although [pandas]
was one of the post's tags from the beginning, I have now edited the title to make the connection with pandas
explicit.
As a very minimal example, suppose that I have a file foo.csv
with the following content:
foo
-482.044
Now, if I read this file in with pandas.read_csv
, and dump a transform of these data using simplejson.dumps
I get the following:
simplejson.dumps(pandas.read_csv('/tmp/foo.csv')
.to_dict(orient='index')
.values()[0])
# '{"foo": -482.04400000000004}'
IOW, the original -482.044
became -482.04400000000004
.
NB: I understand why this happens.
What I'm looking for is some convenient way to get around it.
IOW, the desired JSON string in this case is something like
'{"foo": -482.044}'
I'm looking for a convenient way to generate this string, starting from the file foo.csv
shown earlier.
Needless to say, this example is unrealistically simple. In practice, foo.csv
would contain thousands/millions of rows, and tens/hundreds of columns, not all necessarily floats (or even numeric). I'm only interested in solutions that would work for such real-life data.
Of course, I could avoid floating-point issues altogether by passing dtype=str
to pandas.read_csv
, but this would not produce the desired result:
simplejson.dumps(pandas.read_csv('/tmp/foo.csv', dtype=str)
.to_dict(orient='index')
.values()[0])
# '{"foo": "-482.044"}'
To put it in different terms: I want the input CSV to serve as the explicit specification of how to serialize whatever floating point values it contains. Is there a simple/convenient way to achieve this?