Pandas dataframe float index and transpose error

Question

I'm trying to properly import data from a space delimited file into a pandas dataframe so that I can plot it properly. My data file looks like so:

Vmeas   -5.00E+000  -4.50E+000  -4.00E+000  -3.50E+000 ...
vfd3051 -3.20E-008  -1.49E-009  1.38E-008   -1.17E-008 ...
vfd3151 -3.71E-008  -6.58E-009  -6.58E-009  -6.58E-009 ...
vfd3251 -4.73E-008  3.59E-009   8.68E-009   -1.68E-008 ...
vfd3351 -2.18E-008  -3.71E-008  3.60E-009   -3.20E-008 ...

So the test location is originally in the rows with the columns increasing in voltage to the right to 20V.

My code to read the data file into the dataframe is:

if __name__ == '__main__':
    file_path = str(input("Enter the filename to open:  "))
    save = str(input('Do you wish to save a pdf of the IV plots? (y/n): '))
    df = pd.read_csv(file_path, index_col="Vmeas", delim_whitespace=True, header=0)
    df = df.T
    df.reset_index(inplace=True)
    df.index.names = ['Voltage']
    df.columns.names = ['Die_numbers']
    df.drop('index',axis=1, inplace=True)
    make_plots(df, save)

The actual plotting is done by:

def make_plots(df, save):
    voltage = np.arange(-5, 20, 0.5)
    plt.figure(figsize=(10, 7))
    for col in df:
        plt.plot(voltage, col, legend=False)
    plt.show()

At first, I encountered problems with the voltage being treated by pandas as a string and since pandas doesn't play nice with float indexes. Trying that initially started my plot of a diode current-voltage relationship at 0. (https://i.stack.imgur.com/i2XOY.jpg) Then, I re-indexed it but then plotting that still didn't work. Now, I've re-indexed the dataframe, dropped the old index column and when I check the df.head() everything looks right:

Die_numbers       vfd3051       vfd3151           vfd3251          vfd3351  
Voltage                                                               
0                -3.202241e-08 -3.711351e-08 -4.728576e-08 -2.184733e-08   
1                -1.493095e-09 -6.580329e-09  3.594383e-09 -3.710431e-08   
2                 1.377107e-08 -6.581644e-09  8.683344e-09  3.595368e-09

except now I keep running into a ValueError in mpl. I think this is related to the col values being strings instead of floats which I don't understand because it was printing the currents properly before.

Admittedly, I'm new to pandas but it seems like at every turn I am stopped, by my ignorance no doubt, but it's getting tiresome. Is there a better way to do this? Perhaps I should just ignore the first row of the logfile? Can I convert from scientific notation while reading the file in? Keep plugging away?

Thanks.

df.info() is: Int64Index: 51 entries, 0 to 50 Columns: 1092 entries, vfd3051 to vfd6824 dtypes: float64(1092)

Everything seems to load into pandas correctly but mpl doesn't like something in the data. The columns are floats, I'm not using the index of integers. If the column names were being added as my first row, the columns would be treated as str or obj type. The error is:

 Traceback (most recent call last):
  File "D:\Python\el_plot_top_10\IV_plot_all.py", line 51, in <module>
    make_plots(df, save)
  File "D:\Python\el_plot_top_10\IV_plot_all.py", line 21, in make_plots
    plt.plot(voltage, col, legend=False)
  File "C:\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 2987, in plot
    ret = ax.plot(*args, **kwargs)
  File "C:\Anaconda3\lib\site-packages\matplotlib\axes.py", line 4139, in plot
    for line in self._get_lines(*args, **kwargs):
  File "C:\Anaconda3\lib\site-packages\matplotlib\axes.py", line 319, in _grab_next_args
    for seg in self._plot_args(remaining, kwargs):
  File "C:\Anaconda3\lib\site-packages\matplotlib\axes.py", line 278, in _plot_args
    linestyle, marker, color = _process_plot_format(tup[-1])
  File "C:\Anaconda3\lib\site-packages\matplotlib\axes.py", line 131, in _process_plot_format
    'Unrecognized character %c in format string' % c)
ValueError: Unrecognized character f in format string

can you post pandas/numpy versions? should soln looks fine to me. show ``df.info()`` as well. — Jeff, Aug 12 '14 at 12:33

score 0 · Answer 1 · answered Aug 13 '14 at 02:22

I figured out how to make this work entirely in pandas. Don't indicate an index nor a header row. Transpose the dataframe and drop the index. Then, create a list out of the first row of data which will be your string titles for the columns you really wanted. Assign the column names to this list and then reassign the dataframe to a sliced dataframe eliminating the first row of string names ('vfd3021' in my case).

After that, you're good to go. The columns are float and since my voltage range is fixed, I just create a list with arange when I plot.

if __name__ == '__main__':
    file_path = str(input("Enter the filename to open:  "))
    save = str(input('Do you wish to save a pdf of the IV plots? (y/n): '))

    df = pd.read_csv(file_path, delim_whitespace=True)

    df = df.T
    df.reset_index(inplace=True)
    df.index.names = ['Voltage']
    df.columns.names = ['Die_numbers']
    df.drop('index', axis=1, inplace=True)
    names = df.iloc[0].values
    df.columns = names
    df = df[1:]
    make_plots(df, save)

score -1 · Answer 2 · edited May 23 '17 at 10:28

As far as I can see all your problems are coming from not getting your data in the correct format to begin with. Just focus on importing the data and print what your going to plot checking that the types are what you would expect them to be.

I would advise using a different method to import the data as the file format is not what pandas works best with (e.g it is transposed). For example, you could use numpy.genfromtxt, an introduction is given here.

import numpy as np
from StringIO import StringIO 

data_file = (
"""Vmeas   -5.00E+000  -4.50E+000  -4.00E+000  -3.50E+000
vfd3051 -3.20E-008  -1.49E-009  1.38E-008   -1.17E-008
vfd3151 -3.71E-008  -6.58E-009  -6.58E-009  -6.58E-009
vfd3251 -4.73E-008  3.59E-009   8.68E-009   -1.68E-008
vfd3351 -2.18E-008  -3.71E-008  3.60E-009   -3.20E-008
""")

data = np.genfromtxt(StringIO(data_file), dtype=None)

print data

>>> array([('Vmeas', -5.0, -4.5, -4.0, -3.5),
       ('vfd3051', -3.2e-08, -1.49e-09, 1.38e-08, -1.17e-08),
       ('vfd3151', -3.71e-08, -6.58e-09, -6.58e-09, -6.58e-09),
       ('vfd3251', -4.73e-08, 3.59e-09, 8.68e-09, -1.68e-08),
       ('vfd3351', -2.18e-08, -3.71e-08, 3.6e-09, -3.2e-08)], 
      dtype=[('f0', 'S7'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])

So now we have a numpy array of tuples with the column names as the first index and all the data as the rest of the tuple. Most importantly all the numbers are numbers, try to avoid having strings because conversions are messy.

Then we could do the following to get a nice pandas DataFrame:

DataDictionary = {row[0]:list(row)[1:] for row in iter(data)}
pd.DataFrame(DataDictionary)

Firstly we create a dictionary of the data by using a Python dictionary comprehension, then pass this into the DataFrame. This results in a well behaved dataframe with columns named by the strings "Vmeas", "vdf*" and an index of all the data.

    Vmeas   vfd3051          vfd3151         d3251          vfd3351
0   -5.0    -3.200000e-08   -3.710000e-08   -4.730000e-08   -2.180000e-08
1   -4.5    -1.490000e-09   -6.580000e-09   3.590000e-09    -3.710000e-08
2   -4.0    1.380000e-08    -6.580000e-09   8.680000e-09    3.600000e-09
3   -3.5    -1.170000e-08   -6.580000e-09   -1.680000e-08   -3.200000e-08

I doubt this will fully answer your question but it is a start to getting the data correct before plotting it which I think was your problem. Try to keep it as simple as possible!

If you are going to put it in a frame, then use pandas parsing methods (and MUCH faster than genfromtext), not to mention, no need to iterate and convert to a list. — Jeff, Aug 12 '14 at 11:25
I agree it will be faster but I could not find any method in the docs to transpose the data before reading it in. I would be be happy to see alternative answers as I'm sure there are better methods but as far as I can tell there is nothing actually wrong with what I have done? — Greg, Aug 12 '14 at 11:54
Of course this it transposes the data but because "Vmeas" is in the same row of the initial data, it is then in the same column of the transposed data. As a result the values "-5.00E+000" are not converted this was causing issues for the OP. — Greg, Aug 12 '14 at 12:15
I am not sure what his issues is, the exact soln in his question works for me. — Jeff, Aug 12 '14 at 12:32
Greg, thanks for your efforts and explanation. If I can't get the mpl error worked out I'll do this. Will mark it as answer if I do. — zeppelin_d, Aug 12 '14 at 18:26

Pandas dataframe float index and transpose error

2 Answers2