1

I have a data frame which looks like this:

     legal    value
0    1        3
1    1        7
2    0        10
3    1        12
4    1        4
5    1        17
6    0        21
7    1        19
8    1        3
9    0        18
10   1        17
11   1        17
12   0        11
13   1        23

and I'm trying to split only the legal 1 values into 6 bin intervals for a histogram. The intervals look like:

[0-6], [6-9], [9-12], [12-16], [16-20], [20-24]

The data gathered would then look like this:

bin    frequency   values
0-6    3           3, 4, 3
6-9    1           7
9-12   1           12
12-16  1           12
16-20  4           17, 17, 19, 17
20-24  1           23

I am trying to create a histogram with the bin intervals on the c x axis, and the the frequency of the valid 1 values on the y axis. Basically trying to create a histogram which looks like this example.

So far I have written this:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict 

bins = ['0-6', '6-9', '9-12', '12-16', '16-20', '20-24']
df = pd.read_csv('data.csv', encoding = 'ISO-8859-1')

d = defaultdict(int)
for legal, value in zip(df['legal'], df['value']):
    if (legal == 1):
        if (0 <= value <= 6):
            d[bins[0]] += 1

Which is trying to group the bins with dictionaries, but this seems over complicated and their must be a better way using the pandas library.

How can I use something like pandas.Dataframe.groupby to group the bins with their respective frequencies, then plot these values on a histogram using matplotlib.pyplot?

RoadRunner
  • 25,803
  • 6
  • 42
  • 75

1 Answers1

1

No need to do any grouping, the dataframe can just be filtered by the "legal" column values.

import matplotlib.pyplot as plt
import pandas as pd

legal= [1,1,0,1,1,1,0,1,1,0,1,1,0,1]
value = [3,7,10,12,4,17,21,19,3,18,17,17,11,23]
df = pd.DataFrame({"legal":legal, "value":value})

df2 = df[df["legal"] == 1]

bins = [0,6,9,12,16,20,24]
plt.hist(df2["value"], bins=bins, edgecolor="k")
plt.xticks(bins)

plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712