I would like to plot parallel coordinates for a pandas
DataFrame containing columns with numbers and other columns containing strings as values.
Problem description
I have following test code which works for plotting parallel coordinates with numbers:
import pandas as pd
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates
df = pd.DataFrame([["line 1",20,30,100],\
["line 2",10,40,90],["line 3",10,35,120]],\
columns=["element","var 1","var 2","var 3"])
parallel_coordinates(df,"element")
plt.show()
Which ends up showing following graphic:
However what I would like to attempt is to add some variables to my plot that have strings. But when I run following code:
df2 = pd.DataFrame([["line 1",20,30,100,"N"],\
["line 2",10,40,90,"N"],["line 3",10,35,120,"N-1"]],\
columns=["element","var 1","var 2","var 3","regime"])
parallel_coordinates(df2,"element")
plt.show()
I get this error:
ValueError: invalid literal for float(): N
Which I suppose means parallel_coordinates
function does not accept strings.
Example of what I am trying to do
I am attemting to do something like this example, where Race and Sex are strings and not numbers:
Question
Is there any way to perform such a graphic using pandas
parallel_coordinates
? If not, how could I attempt such graphic? Maybe with matplotlib
?
I must mention I am particularily looking for a solution under Python 2.5 with pandas version 0.9.0
.