I have a data frame which looks like this:
legal value
0 1 3
1 1 7
2 0 10
3 1 12
4 1 4
5 1 17
6 0 21
7 1 19
8 1 3
9 0 18
10 1 17
11 1 17
12 0 11
13 1 23
and I'm trying to split only the legal 1 values into 6 bin intervals for a histogram. The intervals look like:
[0-6], [6-9], [9-12], [12-16], [16-20], [20-24]
The data gathered would then look like this:
bin frequency values
0-6 3 3, 4, 3
6-9 1 7
9-12 1 12
12-16 1 12
16-20 4 17, 17, 19, 17
20-24 1 23
I am trying to create a histogram with the bin intervals on the c x axis, and the the frequency of the valid 1 values on the y axis. Basically trying to create a histogram which looks like this example.
So far I have written this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict
bins = ['0-6', '6-9', '9-12', '12-16', '16-20', '20-24']
df = pd.read_csv('data.csv', encoding = 'ISO-8859-1')
d = defaultdict(int)
for legal, value in zip(df['legal'], df['value']):
if (legal == 1):
if (0 <= value <= 6):
d[bins[0]] += 1
Which is trying to group the bins with dictionaries, but this seems over complicated and their must be a better way using the pandas
library.
How can I use something like pandas.Dataframe.groupby
to group the bins with their respective frequencies, then plot these values on a histogram using matplotlib.pyplot
?