1

As the title describes, when I try to import data from a .csv file, pandas takes it upon itself to modify one of my data columns significantly. My .csv file looks roughly like this:

Date, Price
2015-02-03 17:00:00, 20.95
2015-02-04 17:00:00, 20.927
2015-02-05 17:00:00, 21.322
2015-02-06 17:00:00, 22.158
...

So when I try to import this csv file, this is what I get:

In[2]: fname01 = os.path.join("Data", "myData.csv")
       dfMyData = pd.read_csv(fname02, usecols=["Date", "Price"], sep = ',')
       print(dfMyData)
       print(dfMyData.dtypes)

Out[2]:
                     Date               Price
0     2015-02-03 17:00:00        2.095000e+01
1     2015-02-04 17:00:00        2.092700e+01
2     2015-02-05 17:00:00        2.132200e+01
3     2015-02-06 17:00:00        2.215800e+01

Date               object
Price              float64
dtype: object

As you can see in the Price column, pandas moves the decimal point to the left and goes crazy with the rest of the decimals. At least the type is still float64.
Can anyone tell me what is going on here, and how I can fix it?
Any help is greatly appreciated.

Jonas Svare
  • 65
  • 2
  • 6
  • 3
    This is not moving the decimal point, this is scientific notation. For example, 2.1e+01 means 2.1 * 10, it is just a different way of displaying numbers. You can search for scientific notation if you are not familiar with this. – seermer Apr 12 '23 at 11:58
  • 1
    Okay, I will admit that I am not very good when it comes to scientific notations and all that. Is there any way to stop pandas from doing this? – Jonas Svare Apr 12 '23 at 12:00
  • 1
    [This](https://stackoverflow.com/a/41023046/14447614) is probably what you're looking for. – white Apr 12 '23 at 12:02
  • 1
    @JonasSvare I provided an article explaining this in my answer – Yannick Funk Apr 12 '23 at 12:02
  • 1
    I would agree that scientific notations are less straightforward. But I guess one reason it uses scientific notations is that the data contains numbers from very different ranges. This format would be cleaner if, for example, there are both 0.0001 and 1000000 in the column. – seermer Apr 12 '23 at 12:10

1 Answers1

1

Pandas displays the numbers in scientific format, one way to suppress this is setting the global option:

pd.set_option('display.float_format', lambda x: '%.5f' % x)

Which displays floats with 5 digits after the point. See this article.

Yannick Funk
  • 1,319
  • 10
  • 23