1

I have df with 70000 ages I want to group them by age like this

18-30
30-50
50-99

and compare them with other column which tells us revenue:

enter image description here

Zephyr
  • 11,891
  • 53
  • 45
  • 80
Burak
  • 13
  • 4

1 Answers1

0

If you have a dataframe like this one:

N = 1000
df = pd.DataFrame({'age': np.random.randint(18, 99, N),
                   'revenue': 20 + 200*np.abs(np.random.randn(N))})
   age     revenue
0   69   56.776670
1   32   40.019089
2   89   38.045533
3   78  176.214654
4   38  527.738220
5   92  124.790533
6   92  137.617365
7   41   46.680172
8   20  234.199293
9   39  136.560120

You can cut the dataframe in age groups with pandas.cut:

df['group'] = pd.cut(df['age'], bins = [18, 30, 50, 99], include_lowest = True, labels = ['18-30', '30-50', '50-99'])
   age     revenue  group
0   69   56.776670  50-99
1   32   40.019089  30-50
2   89   38.045533  50-99
3   78  176.214654  50-99
4   38  527.738220  30-50
5   92  124.790533  50-99
6   92  137.617365  50-99
7   41   46.680172  30-50
8   20  234.199293  18-30
9   39  136.560120  30-50

Then you can group the age groups with pandas.DataFrame.groupby:

df = df.groupby(by = 'group').mean()
             age     revenue
group                       
18-30  23.534091  184.895077
30-50  40.529183  185.348380
50-99  73.902998  170.889141

Now, finally, you are ready to plot the data:

fig, ax = plt.subplots()

ax.bar(x = df.index, height = df['revenue'])

plt.show()

Complete Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


N = 1000
df = pd.DataFrame({'age': np.random.randint(18, 99, N),
                   'revenue': 20 + 200*np.abs(np.random.randn(N))})

df['group'] = pd.cut(df['age'], bins = [18, 30, 50, 99], include_lowest = True, labels = ['18-30', '30-50', '50-99'])

df = df.groupby(by = 'group').mean()


fig, ax = plt.subplots()

ax.bar(x = df.index, height = df['revenue'])

plt.show()

enter image description here

Zephyr
  • 11,891
  • 53
  • 45
  • 80