Scientific Notation Matplotlib / Pandas

Question

I have a CSV-file with about 28 columns and 4000 rows. From two of these columns i want to plot about 50 specific rows. I used pandas to select this part of the file, but i cannot figure out, how it reads the scientific numbers in a right way.

My code:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("20180416309.csv", sep=";")

x = df.loc[df[u'run#'] == 3, [u'     Diameter']].values
y = df.loc[df[u'run#'] == 3, [u'      dN/dlnD']].values

plt.plot(x, y)
plt.show

So, i am trying to plot the columns u' Diameter' and u' dN/dlnD' when in column u'run#' displays the number 3. Typing "x" or "y" in the IPython console, the right numbers are given.

Unfortunately, the plot looks like this:

As you can see, the decimal power of the scientific notation of these numbers on the y-axis is ignored. How can i fix this? This is my first try using matplotlib and pandas, so please excuse my beginner question.

Edit:

The file´s data looks like this:

run#;     Diameter;      dN/dlnD;
12; +3,58151E+01; +1,17336E+03;
13; +3,26913E+01; +6,06044E+03;
13; +2,98524E+01; +1,76516E+04;
13; +2,72704E+01; +4,88716E+04;
13; +2,49202E+01; +1,00035E+05;

Reading out my "x" or "y" data with the IPython console, the output is like this:

   [' +1,94251E+02'],
   [' +5,23981E+02'],
   [' +0,00000E+00'],
   [' +1,10525E+02'],
   [' +0,00000E+00'],
   [' +4,76363E+01'],
   [' +1,61714E+01'],
   [' +1,65482E+02'],
   [' +0,00000E+00'],
   [' +4,75312E+02'],
   [' +4,20174E+01']], dtype=object)

SOLUTION:

As you pointed out, the comma was the problem. I simply added the decimal setting in the code:

df = pd.read_csv("test.csv", sep=";", decimal=",")

Now the graph looks like, how it is supposed to look.

Thank you!

are your values being read in as strings instead of actual floats? You likely need to convert them to numbers with `pd.to_numeric()`. It shouldn't have an issue with the `E+` notation — ALollz, Apr 17 '18 at 18:30
`..the scientific notation of the numbers on the y-axis is ignored` - Can you explain that better? Is the display format wrong? Is it being plotted incorrectly? Please include a minimal portion of the file's data (maybe 10-15 rows, a few relevant columns) that we can use to diagnose the problem, just copy and paste it into you question as text formatted as code. Please read [mcve]. — wwii, Apr 17 '18 at 18:32
You can rotate your xtick labels. https://stackoverflow.com/a/43969357/6361531 — Scott Boston, Apr 17 '18 at 18:44

score 0 · Accepted Answer · answered Apr 17 '18 at 20:01

It's clear that the csv data wasn't read correctly or more specifically as you expected. Based on your examples, all of your data was read as strings including the numbers. The reason is that the format of the numbers in your file will not be interpreted correctly depending on your locale. I modified the small snippet of data you provided so that the period and not the comma represents the decimal point which is customary in my locale. As you can see, the data is properly read into the dataframe.

df = pd.read_csv("d:\\users\\floyd\\documents\\sample.csv", sep=';'); df
Out[72]: 
   run#       Diameter        dN/dlnD
0    12        35.8151        1173.36
1    13        32.6913        6060.44
2    13        29.8524       17651.60
3    13        27.2704       48871.60
4    13        24.9202      100035.00

I also removed the annoying leading spaces in the column names with this.

df.columns = [col.strip() for col in df.columns]; df.columns

Now it plots properly.

plt.plot(df['Diameter'], df['dN/dlnD'])
Out[75]: [<matplotlib.lines.Line2D at 0x25ef97bd0b8>]

Scientific Notation Matplotlib / Pandas

1 Answers1