Pandas reads integer column in scientific notation

Question

For some reason, when I import my csv file with pd.read_csv, one of my integer columns (number of followers) is read in scientific notation, even though my values are whole numbers and clearly not in scientific notation.

See below what I see when I call df["num_followers"].describe()

I've looked at all the answers for "suppress scientific notation" on here but haven't found any solution that works.

df['num_followers'].apply(lambda x: '{:.2f}'.format(x)) simply turned my values to str. I tried converting to astype("float") with no success, values are still in scientific notation, which is messing up my calculations. Any ideas how I can change it to int?

count    1.200000e+02
mean     4.959472e+04
std      3.816126e+05
min      0.000000e+00
25%      6.725000e+01
50%      2.165000e+02
75%      5.932500e+02
max      4.021842e+06
Name: num_followers, dtype: float64

EDIT

I tried one of the answers below, also to no success:

IN: df_train = pd.read_csv("social_media_train.csv", index_col = [0])
df_train["num_followers"].describe()

OUT: count    5.760000e+02
mean     8.530724e+04
std      9.101485e+05
min      0.000000e+00
25%      3.900000e+01
50%      1.505000e+02
75%      7.160000e+02
max      1.533854e+07
Name: num_followers, dtype: float64

IN: df_train['num_followers'] = df_train['num_followers'].apply(np.int64)
df_train["num_followers"].describe()

OUT:count    5.760000e+02
mean     8.530724e+04
std      9.101485e+05
min      0.000000e+00
25%      3.900000e+01
50%      1.505000e+02
75%      7.160000e+02
max      1.533854e+07
Name: num_followers, dtype: float64

Possible duplicate of [dataframe.describe() suppress scientific notation](https://stackoverflow.com/questions/40347689/dataframe-describe-suppress-scientific-notation) — Georgy, Apr 05 '19 at 11:53
Using `df_train["num_followers"].describe().apply(lambda x: format(x, 'f'))` worked but how can I then keep the values in non-scientific notation after? If you call `describe()` after, you'll just see the same exponential values again — Marielle Dado, Apr 05 '19 at 12:00
Did you got answer for this question? If yes, Can you please update working solution. — Rajnish kumar, Sep 16 '20 at 11:54

alec_djinn · Answer 1 · 2019-04-05T11:32:34.467

0

You can use np.int64 with apply (https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html).

import numpy as np

df['num_followers'] = df['num_followers'].apply(np.int64)

edited Apr 05 '19 at 11:32

answered Apr 05 '19 at 11:24

alec_djinn

10,104
8
46
71

This did not work, unfortunately. The column still in exponential notation – Marielle Dado Apr 05 '19 at 11:29
@MarielleDado Are you sure? It works on my computer and I don't see why should not in your case. – alec_djinn Apr 05 '19 at 11:40
Yes, I just updated my post with my results based on your answer – Marielle Dado Apr 05 '19 at 11:45

score 0 · Answer 2 · answered Apr 05 '19 at 11:43

0

Use the dtype= option in pd.read_csv, e.g.

df = pd.read_csv('filename.csv', dtype={'num_followers': np.int64})

You can of course specify dtypes for additional columns in the dict there.

answered Apr 05 '19 at 11:43

Ketil Tveiten

230
2
10

I already tried converting the data type after importing, with `astype()`, would this be any different? – Marielle Dado Apr 05 '19 at 11:46
This will throw an annoying error such as `"Integer column has NA values in column 1"` if the column trying to be set as an integer happens to have a blank value. – yeliabsalohcin Sep 16 '21 at 06:22

Pandas reads integer column in scientific notation

2 Answers2