2

I am trying to plot a dynamically size able bubble (scatter map). When I try to plot with random data I can very well plot. But when I am trying to parse my input file I am not able to plot.

Input:

Nos,Place,Way,Name,00:00:00,12:00:00
123,London,Air,Apollo,342,972
123,London,Rail,Beta,2352,342
123,Paris,Bus,Beta,545,353
345,Paris,Bus,Rava,652,974
345,Rome,Bus,Rava,2325,56
345,London,Air,Rava,2532,9853
567,Paris,Air,Apollo,545,544
567,Rome,Rail,Apollo,5454,5
876,Japan,Rail,Apollo,644,54
876,Japan,Bus,Beta,45,57

Program:

import pandas as pd
from pandas import DataFrame
import pandas.io.data
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns


df=pd.read_csv('text_2.csv')


#SIZE OF BUBBLES CHANGES

fig = plt.figure()

ax = fig.add_subplot(1,1,1)

ax.scatter(df['Place'],df['Name'], s=df['00:00:00']) # Added third variable income as size of the bubble


plt.show()

I am trying to put Place as x axis and Name as y axis and Size to be taken from the count(00:00). Sizable bubble I could not find much of examples around. Any valuable suggestions is appropriated. Thanks in Advance. Why do I get error at (00:00) column and how do I pass the values of that column ?

Error:

    Traceback (most recent call last):
  File "Bubble_plot.py", line 18, in <module>
    ax.scatter(df['Place'],df['Name'], s=df['00:00:00']) # Added third variable income as size of the bubble
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 6266, in scatter
    x, y, s, c = cbook.delete_masked_points(x, y, s, c)
  File "/usr/lib/pymodules/python2.7/matplotlib/cbook.py", line 1774, in delete_masked_points
    raise ValueError("First argument must be a sequence")
ValueError: First argument must be a sequence
JohnE
  • 29,156
  • 8
  • 79
  • 109
  • What exactly is your question? There is not a single question mark in your post. What fails? Please be more specific. – hitzg Jun 26 '15 at 10:35
  • 2
    Well isn't the column called `'00:00:00'` (and not `'00:00'`)?! – hitzg Jun 26 '15 at 11:40
  • 2
    You can't make a scatter plot with strings as coordinates. – hitzg Jun 26 '15 at 11:52
  • @mwaskom -- I put the seaborn tag back b/c it looks like seaborn heatmap may be a useful solution. Please re-delete the seaborn tag if you don't want it showing up there. – JohnE Jun 26 '15 at 17:45
  • @JohnE Seaborn solution is good .. let the tag be. –  Jun 26 '15 at 17:47
  • Thank you :) Appreciate the information .. ! –  Jun 26 '15 at 18:05

1 Answers1

3

I was hoping this might work by just changing 'Name' and 'Place' to categoricals, but no luck there (with either plot or seaborn). It will basically work if you convert them to integers but then you lose the labels that you'd have with strings or categoricals. FWIW:

df2 = df.copy()
for c in ['Place','Name']:
    df2[c] = df2[c].astype('category').cat.codes

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(df2['Place'],df2['Name'], s=df2['00:00:00'])

enter image description here

Or maybe a heatmap would work better? It seems to accept categoricals, so you get the labeling for free.

df3 = df.copy()
for c in ['Place','Name']:
    df3[c] = df3[c].astype('category')

sns.heatmap( df3.pivot_table( index='Place', columns='Name', values='00:00:00' ) )

enter image description here

JohnE
  • 29,156
  • 8
  • 79
  • 109
  • You are just life saver :) I was also thinking about heatmaps. Would it be possible to have annotations with `count` of `00:00:00` ? –  Jun 26 '15 at 17:31
  • 1
    `annot=True`, though it formats as float instead of integer. Not sure if there is a way to change that. You can also just print the pivot_table itself. – JohnE Jun 26 '15 at 17:41
  • Why I am asking you for suggestion is .. we have 2-3 values for each name i.e. the cell might show all those values but I am trying to get an addition of all the counts falling in those names cells. –  Jun 26 '15 at 17:43
  • 1
    OK, I am not sure exactly w.r.t. seaborn. You may want to post a followup question focusing specifically on that. You can definitely put multiple aggfuncs in a pivot table, I'm just not sure offhand about translating all of that to a heatmap. – JohnE Jun 26 '15 at 17:50
  • 2
    `fmt=d`, as in http://stackoverflow.com/questions/31055302/how-to-avoid-scientific-notation-when-annotating-a-seaborn-clustermap – mwaskom Jun 26 '15 at 18:07
  • @mwaskom Thank you so much ! –  Jun 26 '15 at 21:42
  • @JohnE When Executed both the above programs I get this error: `File "heatmap_sns.py", line 11, in df3[c] = df3[c].astype('category') File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 2018, in astype dtype, copy=copy, raise_on_error=raise_on_error) TypeError: data type "category" not understood` –  Jun 26 '15 at 21:48
  • 1
    @SitzBlogz Category is a somewhat new feature of pandas. You might need to update your version -- probably to 0.16. – JohnE Jun 26 '15 at 21:54
  • Updated Pandas to new version 0.16 and now another error. `File "heatmap_sns.py", line 13, in sns.heatmap(df3.pivot_table(index='Name', columns='Taluka', values='00:00:00', annot=True, fmt="d") ) TypeError: pivot_table() got an unexpected keyword argument 'annot' ` Cross Checked with the syntax of Seaborn from Document and this and is correct yet error. –  Jun 27 '15 at 04:17