Pandas read scientific notation and change

Question

I have a dataframe in pandas that i'm reading in from a csv.

One of my columns has values that include NaN, floats, and scientific notation, i.e. 5.3e-23

My trouble is that as I read in the csv, pandas views these data as an object dtype, not the float32 that it should be. I guess because it thinks the scientific notation entries are strings.

I've tried to convert the dtype using df['speed'].astype(float) after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a']). This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...

So far neither of these methods have worked. Am I missing something that is an incredibly easy fix?

this question seems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible.

EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS

7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5

I can't reproduce that problem. Reading values in scientific notation seems to work fine. Can you provide a small sample dataset demonstrating the problem? Are you sure there isn't some other value in the data that is causing the error? — BrenBarn, Dec 01 '15 at 06:18
@BrenBarn, @Anton Protopopov, do you think it's the `Infinity` causing this? — hselbie, Dec 01 '15 at 16:54
By "tried to convert the dtype", do you mean you simply typed `df['speed'].astype(float)`? Because `df['speed'] = df['speed'].astype(float)` should have worked. — DSM, Dec 01 '15 at 16:59
`inf` will work, but not `Infinity`. There is [a bug report](https://github.com/pydata/pandas/issues/10065) asking for support for `Infinity`, but it's not handled yet. — BrenBarn, Dec 01 '15 at 18:27

score 2 · Answer 1 · edited May 23 '17 at 12:10

It's hard to say without seeing your data but it seems that problem in your rows that they contain something else except for numbers and 'n/a' values. You could load your dataframe and then convert it to numeric as show in answers for that question. If you have pandas version >= 0.17.0 then you could use following:

df1 = df.apply(pd.to_numeric, args=('coerce',))

Then you could drop row with NA values with dropna or fill them with zeros with fillna

score 1 · Accepted Answer · answered Dec 01 '15 at 17:35

I realised it was the infinity statement causing the issue in my data. Removing this with a find and replace worked.

@Anton Protopopov answer also works as did @DSM's comment regarding me not typing df['speed'] = df['speed'].astype(float).

Thanks for the help.

score 0 · Answer 3 · answered Oct 15 '20 at 17:05

0

In my case, using pandas.round() worked.

df['column'] = df['column'].round(2)

answered Oct 15 '20 at 17:05

onofricamila

930
1
11
20

Pandas read scientific notation and change

3 Answers3

Linked