4

I have to plot a parallel plot of some dataset with varying ranges. When I googled I found one beautiful javascript example in this website.

I have creates some sample dataset for the test and would like to achieve parallel plot having yxis-ticks and different-range yaxes something similar to this image:

So far I have done this:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates
np.random.seed(100)

%matplotlib inline

df = pd.DataFrame({'calcium': np.random.randint(0,7,5),
                  'calories': np.random.randint(200,900,5),
                 'fiber': np.random.randint(10,75,5),
                'potassium': np.random.randint(0,20,5)
                  })
df = df.T
df['name'] = df.index

df.reset_index(drop=True)

parallel_coordinates(df,'name')

The output is this:

As we can see the bottom curves are highly undiscernable. I would like to fix that. I have googled and tried to find how to change the vertical y-axis tick marks and change ranges (normalize).

Help will be appreciated. This is a beautiful plot, kudos to those who on the planet earth succeed to visualize this beautiful plot in python!!

Related links:
http://bl.ocks.org/syntagmatic/raw/3150059/
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.plotting.parallel_coordinates.html
https://pandas.pydata.org/pandas-docs/stable/visualization.html
How to plot parallel coordinates on pandas DataFrame with some columns containing strings?

Update

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates
np.random.seed(100)

plt.style.use('ggplot')
%matplotlib inline

df = pd.DataFrame({'calcium': np.random.randint(0,7,5),
                   'calories': np.random.randint(200,900,5),
                   'fiber': np.random.randint(10,75,5),
                   'potassium': np.random.randint(0,20,5),
                   'name': ['apple','banana','orange','mango','watermelon']

                  })
ax = parallel_coordinates(df,'name')
ax.grid(True)
ax.set_yscale('log')

enter image description here

Still Cannot put ytick marks on middle axes.

  • Have you thought about using a logarithmic y axis? It is useful when the range of values is too wide to plot it in a linear axis. – b-fg Oct 24 '18 at 14:21
  • Yes, that would widen the first y-axis ticklabels, but will have do more to give yticks for all other axes. My aim is to get plot like from blocks.org –  Oct 24 '18 at 14:23
  • But you are plotting all your elements in a single plot, they use multiple subplots. Are you asking for how to plot multiple plots with different y axis range? – b-fg Oct 24 '18 at 14:26
  • Maybe my wording are different, please see that first image in the question, whatever the method they applied to get that beautiful picture, I am trying to get the figure like them. Also `ax.set_yscale('log')` was a good idea. Appreciate the suggestion. –  Oct 24 '18 at 14:30
  • Have a look [here](https://matplotlib.org/gallery/subplots_axes_and_figures/subplot.html) on how to create multiple subplots (one for each element with different yaxis ranges). This is how you could make the figure you want. – b-fg Oct 24 '18 at 15:01
  • 1
    I do not want to be a grumbler, but the plot you show as example is overloaded an not very useful imho. Just because it looks cool, doesnt mean that it is an informative plot. – Moritz Oct 24 '18 at 18:05
  • I suppose beauty is in the eye of the beholder. That plot is incredibly busy, hard to interpret, and carries a lot of unintended implications. Line plots carry implicit notions about the continuity of the independent variable. *DO* calories come after calcium? Alphabetically, sure. In any other meaningful way? Not really. Why should it come after calcium? Plots should be *as simple as possible* to convey the information you're trying to convey – PMende Oct 24 '18 at 18:05
  • @PMende Agreed. Here I am learning to code and learning technicalities, aesthetics vs usefulness are another topic to discuss. –  Oct 24 '18 at 18:09

1 Answers1

1

This is a solution that will help improve readability using a broken y axes. I stole most of this code from here.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(100)

%matplotlib inline

df = pd.DataFrame({'calcium': np.random.randint(0,7,5),
              'calories': np.random.randint(200,900,5),
             'fiber': np.random.randint(10,75,5),
            'potassium': np.random.randint(0,20,5)
              })

f, (ax, ax2) = plt.subplots(2, 1, sharex=True)

#plot the same data on both axes
ax.plot(df)
ax2.plot(df)

# zoom-in / limit the view to different portions of the data
ax.set_ylim(250, 800)  # outliers only
ax2.set_ylim(0, 75)  # most of the data

# hide the spines between ax and ax2
ax.spines['bottom'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax.xaxis.tick_top()
ax.tick_params(labeltop='off')  # don't put tick labels at the top
ax2.xaxis.tick_bottom()

d = .015  # how big to make the diagonal lines in axes coordinates
kwargs = dict(transform=ax.transAxes, color='k', clip_on=False)
ax.plot((-d, +d), (-d, +d), **kwargs)        # top-left diagonal
ax.plot((1 - d, 1 + d), (-d, +d), **kwargs)  # top-right diagonal

kwargs.update(transform=ax2.transAxes)  # switch to the bottom axes
ax2.plot((-d, +d), (1 - d, 1 + d), **kwargs)  # bottom-left diagonal
ax2.plot((1 - d, 1 + d), (1 - d, 1 + d), **kwargs)  # bottom-right diagonal


f.subplots_adjust(left=0.1, right=1.6, 
              bottom=0.1, top = 0.9, 
              hspace=0.3) # space between the two sections
f.legend(df.columns)

plt.show()

Which produces a plot that looks like this: enter image description here

I still think that the calcium line is challenging to interpret but you could blow up the image or break the y axis again if the graph is simple enough to break into chunks.

J_Heads
  • 490
  • 2
  • 11
  • `f.legend(labels=df.columns) TypeError: legend() missing 1 required positional argument: 'handles'` –  Oct 24 '18 at 17:36
  • I'm on matplotlib 2.2.2, you could hack around by replacing ax.plot(df) with for col in df.columns: ax.plot(df[col], label=col) and replace f.legend(df.columns) with f.legend() – J_Heads Oct 24 '18 at 17:48