Given the following dataframe in pandas:
import numpy as np
df = pandas.DataFrame({"a": np.random.random(100), "b": np.random.random(100), "id": np.arange(100)})
where id
is an id for each point consisting of an a
and b
value, how can I bin a
and b
into a specified set of bins (so that I can then take the median/average value of a
and b
in each bin)? df
might have NaN
values for a
or b
(or both) for any given row in df
.
Here's a better example using Joe Kington's solution with a more realistic df
. The thing I'm unsure about is how to access the df.b
elements for each df.a
group below:
a = np.random.random(20)
df = pandas.DataFrame({"a": a, "b": a + 10})
# bins for df.a
bins = np.linspace(0, 1, 10)
# bin df according to a
groups = df.groupby(np.digitize(df.a,bins))
# Get the mean of a in each group
print groups.mean()
## But how to get the mean of b for each group of a?
# ...