8

I'm looking to take a python DataFrame with a bunch of timelines in it and plot these in a single figure. The DataFrame indices are Timestamps and there's a specific column, we'll call "sequence", that contains strings like "A" and "B". So the DataFrame looks something like this:

+--------------------------+---+
| 2014-07-01 00:01:00.0000 | A |
+--------------------------+---+
| 2014-07-01 00:02:00.0000 | B |
+--------------------------+---+
| 2014-07-01 00:04:00.0000 | A |
+--------------------------+---+
| 2014-07-01 00:08:00.0000 | A |
+--------------------------+---+
| 2014-07-01 00:08:00.0000 | B |
+--------------------------+---+
| 2014-07-01 00:10:00.0000 | B |
+--------------------------+---+
| 2014-07-01 00:11:00.0000 | B |
+--------------------------+---+

I'm looking for a plot something like this:

B |  *     * **
A | *  *   *
  +------------
    Timestamp
Agrajag9
  • 726
  • 3
  • 7
  • 10

1 Answers1

14

I would just map each category to a y-value using a dictionary.

import random
import numpy as np
import matplotlib.pyplot as plt
import pandas

categories = list('ABCD')

# map categories to y-values
cat_dict = dict(zip(categories, range(1, len(categories)+1)))

# map y-values to categories
val_dict = dict(zip(range(1, len(categories)+1), categories))

# setup the dataframe
dates = pandas.DatetimeIndex(freq='20T', start='2012-05-05 13:00', end='2012-05-05 18:59')
values = [random.choice(categories) for _ in range(len(dates))]
df = pandas.DataFrame(data=values, index=dates, columns=['category'])

# determing the y-values from categories
df['plotval'] = df['category'].apply(cat_dict.get)

# make the plot
fig, ax = plt.subplots()
df['plotval'].plot(ax=ax, style='ks')
ax.margins(0.2)

# format y-ticks look up the categories
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, pos: val_dict.get(x)))

And I get:

enter image description here

Note that since you probably already have a dataframe, you can build cat_dict and val_dict like this:

# map categories to y-values
cat_dict = dict(zip(pandas.unique(df['category']), range(1, len(categories)+1)))

# map y-values to categories
val_dict = dict(zip(range(1, len(categories)+1), pandas.unique(df['category'])))
Paul H
  • 65,268
  • 20
  • 159
  • 136
  • Good idea! But does this work, too, if you have two categories at the same time step (see plot example from Agrajag9)? – user3017048 Aug 30 '15 at 07:28
  • How do you put the categories in the dataframe then? For me it seems that it raises an error if you either try to put two values in one column...both in the 'category' and - even worse - in the 'plotval'. I thought I have to create as many nan-filled columns as there are observation categories and put the y values in the respective columns. – user3017048 Aug 30 '15 at 16:47
  • I would just add a another row to the dataframe. @user3017048 – Paul H Aug 30 '15 at 19:08
  • Ah, okay, got what you mean! Thanks! – user3017048 Aug 31 '15 at 06:38