0

I have table data available in following format :

id  value    valid
1   0.43323  true
2   0.83122  false
3   0.33132  true
4   0.58351  false
5   0.74143  true
6   0.44334  true
7   0.86436  false
8   0.73555  true
9   0.56534  false
10  0.66234  true
...

I am trying to plot a histogram like this one

enter image description here

Wanted to know if there is a way to do it in panda dataframe to group numeric values from .0 to .1 then .1 to .2 and so on to represent data like presented in image with color coding the bar with true and false count separately.

I am thinking to create separate slices in a dictionary and then count true/false value separately. Later I can create a histogram with this. Is there a better way to plot such histogram without doing all these calculations?

What I have so far with bin and cut:

new_df = df[['value','valid']]
bins = [0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1]
s = new_df.groupby(pd.cut(new_df['value'], bins=bins)).size()
s.plot(kind='bar', stacked=True)

With this i am able to get total count histogram with bins, I am not able to apply the color coding of 'valid' column true/false count for each bar.

NewBee
  • 839
  • 7
  • 18
  • 42
  • 1
    If you disagree with the closure of your question: [Panda dataframe : plot histogram with grouping](https://stackoverflow.com/q/68776378/15497888) there is a [process to reopen a question](https://stackoverflow.com/help/reopen-questions) and it is decidedly not deleting and reposting the same question. – Henry Ecker Aug 13 '21 at 18:25
  • Let me follow that, it asked me to repost the question. The post are couple of year old and i would like to know if there are better ways to combine both questions using new python libraries – NewBee Aug 13 '21 at 18:27
  • You might consider including the duplicates that were linked and _explain_ why they do not apply or what you are looking for that differs. The duplicates in question are [Binning a column with Python Pandas](https://stackoverflow.com/q/45273731/15497888) and [Pandas - Plotting a stacked Bar Chart](https://stackoverflow.com/q/23415500/15497888) for those without the ability to see deleted questions. – Henry Ecker Aug 13 '21 at 18:29
  • Thanks Henry for pointing out the questions. As mentioned earlier these two questions talk about binning, groupby and count separately. The idea with this question is to combine both solution together that I am getting hard time getting around with. i.e : I can generate bins and plot the histogram with it but not able to color code it with separate true/false counts – NewBee Aug 13 '21 at 18:39
  • Let me add the code i have so far – NewBee Aug 13 '21 at 18:49

1 Answers1

2

Try:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

df = pd.DataFrame(
    {
        "value": np.random.random(1000),
        "valid": np.random.choice([True, False], p=[0.7, 0.3], size=1000),
    }
)

df["label"] = pd.cut(df["value"], bins=np.arange(0, 1.01, 0.1))

ax = (
    df.groupby(["label", "valid"])
    .count()
    .unstack()["value"]
    .plot.bar(stacked=True, rot=0, figsize=(10, 7))
)
ax.legend(loc="upper center")
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
_ = ax.set_ylim(0, 150)

Output:

enter image description here

Scott Boston
  • 147,308
  • 15
  • 139
  • 187