3

I am learning to use matplotlib with pandas and I am having a little trouble with it. There is a dataframe which has districts and coffee shops as its y and x labels respectively. And the column values represent the start date of the coffee-shops in respective districts

          starbucks    cafe-cool      barista   ........    60 shops
dist1     2008-09-18  2010-05-04     2007-02-21   ...............
dist2     2007-06-12  2011-02-17       
dist3
.
.
100 districts

I want to plot a scatter plot with x axis as time series and y axis as coffee-shops. Since I couldn't figure out a direct one line way to plot this, I extracted the coffee-shops as one list and dates as other list.

shops = list(df.columns.values)
dt = pd.DataFrame(df.ix['dist1'])
dates = dt.set_index('dist1')

First I tried plt.plot(dates, shops). Got a ZeroDivisionError: integer division or modulo by zero - error. I could not figure out the reason for it. I saw on some posts that the data should be numeric, so I used ytick function.

y = [1, 2, 3, 4, 5, 6,...60] 

still plt.plot(dates, y) threw same ZeroDivisionError. If I could get past this may be I would be able to plot using tick function. Source - http://matplotlib.org/examples/ticks_and_spines/ticklabels_demo_rotation.html

I am trying to plot the graph for only first row/dist1. For that I fetched the first row as a dataframe df1 = df.ix[1] and then used the following

for badges, dates in df.iteritems():

    date = dates

    ax.plot_date(date, yval)

    # Record the number and label of the coffee shop
    label_ticks.append(yval)
    label_list.append(badges)
    yval+=1 

. I got an error at line ax.plot_date(date, yval) saying x and y should be have same first dimension. Since I am plotting one by one for each coffe-shop for dist1 shouldn't the length always be one for both x and y? PS: date is a datetime.date object

user3527975
  • 1,683
  • 8
  • 25
  • 43
  • Are the dates "2008-09-18" being passed as datetime objects or strings? It seems to me that you should iterate through each coffee shop, can you give a minimum working example for just one coffee shop? – Greg Aug 04 '14 at 10:48
  • You could have the x axis as dates and the y axis as districts, then use a third variable (represented by different colours) to outline your 60 different shops. – GCien Aug 04 '14 at 10:53
  • @Greg : The dates are being passed as string objects. What do you mean by working example for one coffee shop? – user3527975 Aug 04 '14 at 14:29
  • Once you get your dates figured out there is also a nice way to map categories to numeric values using dicts in the answer [here](http://stackoverflow.com/questions/25041905/matplotlib-timelines) – Ajean Aug 04 '14 at 15:24

1 Answers1

1

To achieve this you need to convert the dates to datetimes, see here for an example. As mentioned you also need to convert the coffee shops into some numbering system then change the tick labels accordingly.

Here is an attempt

import matplotlib.pyplot as plt
import matplotlib
import numpy as np
import pandas as pd
from datetime import datetime

def get_datetime(string):
    "Converts string '2008-05-04' to datetime"
    return datetime.strptime(string, "%Y-%m-%d")

# Generate datarame
df = pd.DataFrame(dict(
             starbucks=["2008-09-18", "2007-06-12"],
             cafe_cool=["2010-05-04", "2011-02-17"],
             barista=["2007-02-21"]),
             index=["dist1", "dist2"])

ax = plt.subplot(111)

label_list = []
label_ticks = []
yval = 1 # numbering system

# Iterate through coffee shops
for coffee_shop, dates in df.iteritems():

    # Convert strings into datetime list
    datetimes = [get_datetime(date) for date in dates] 

    # Create list of yvals [yval, yval, ...] to plot against
    yval_list = np.zeros(len(dates))+yval

    ax.plot_date(datetimes, yval_list)

    # Record the number and label of the coffee shop
    label_ticks.append(yval)
    label_list.append(coffee_shop)

    yval+=1 # Change the number so they don't all sit at the same y position

# Now set the yticks appropriately
ax.set_yticks(label_ticks)
ax.set_yticklabels(label_list)

# Set the limits so we can see everything
ax.set_ylim(ax.get_ylim()[0]-1,
            ax.get_ylim()[1]+1)
Greg
  • 11,654
  • 3
  • 44
  • 50
  • This solves my problem. Still I didn't understand why did I get the zerodivisionerror? – user3527975 Aug 04 '14 at 16:20
  • I can't reproduce it with what you have given. Post a working example that gives the error and that may be easier to diagnose. – Greg Aug 04 '14 at 16:26