0

I have a dataframe with animal names in it, like:-

cat
dog
pig
lion
tiger
goat
dog
dog
goat
pig
cat
lion

I want to draw a horizontal bar graph using:-

c=['green','pink','blue','yellow','cyan','teal','red','violet']    
df.animal.value_counts().sort_values().plot(kind='barh', color=c, alpha=0.5)

This works well.

Image showing the initial graph

But every time count of the animal changes the colour for that bar changes.

Image showing how the colour for an animal changes when the count increases

I want to have a consistent colour for an animal, say "blue" for "cat" and "green" for "dog" and so on. How do I do this?

This dataset is continuously evolving and can have newer animals names with time and I wish to ensure that a new colour is automatically assigned to an animal as it gets added. Even if this cannot be done I would be grateful if you can help with the initial request.

I tried various options as found on StackOverflow and otherwise but am not getting what I want.

tmdavison
  • 64,360
  • 12
  • 187
  • 165
Sherin Jayanand
  • 204
  • 2
  • 9

1 Answers1

1

The following approach uses a dictionary to assign a fixed color to each animal. For the sorted dataframe, the dictionary is applied to each element of the index.

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

animals = ['cat', 'dog', 'goat', 'lion', 'pig', 'tiger']
color_dict = {'cat': 'turquoise', 'dog': 'sienna', 'goat': 'springgreen',
              'lion': 'gold', 'pig': 'deeppink', 'tiger': 'darkorange'}

fig, axs = plt.subplots(ncols=3, figsize=(12, 3))
for i, ax in enumerate(axs):
    df = pd.DataFrame({'animal': np.random.choice(animals, 40)})
    df_counts = df.animal.value_counts().sort_values()
    # c = [color_dict[a] for a in df_counts.index]
    c = df_counts.index.map(color_dict)
    df_counts.plot(kind='barh', color=c, alpha=0.8, ax=ax, title=f'test {i+1}')
    for j, cnt in enumerate(df_counts):
        ax.text(cnt, j, f'{cnt} ', ha='right', va='center', c='black')
plt.show()

example plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • JohanC: That worked like a charm. Thank you so much. Would you happen to know how to annotate the count of each animal at the end of the bar? – Sherin Jayanand May 22 '20 at 11:45
  • https://stackoverflow.com/questions/30228069/how-to-display-the-value-of-the-bar-on-each-bar-with-pyplot-barh – JohanC May 22 '20 at 11:48
  • Is there an option to set the min and max value of X axis. In your example you will see that chart 1 and 2 is 0 to 8 and chart3 is 0-10, I was wondering if I could set it to a specific range? – Sherin Jayanand May 22 '20 at 13:34
  • Well, you can call `ax.set_xlim(0, 15)`. For a bar chart, the higher limit can be set to the highest expected value plus some padding (default it would be 5% padding). Usually, with positive heights, the lower limit is chosen as `0` (to avoid a sensation of floating). If you do it as `ax.set_xlim(0, max(12, df_counts.max() * 1.05))`, you'd get the best of both worlds: usually a fixed upper limit, but extended when needed. – JohanC May 22 '20 at 13:53
  • A quick follow-up if I may... can I set time as min and max. I was trying to use datetime.timedelta(seconds=0) and datetime.timedelta(seconds=max(duration)) as min and max. This does not seem to be working. – Sherin Jayanand May 25 '20 at 12:57