0

I have df like this: df = pd.DataFrame({"Year":[2000, 2000, 2000, 2001, 2001, 2001], "Name": ["Alice", "Ana", "Tom", "John", "Frank", "Alice"], "Count":[20, 500, 1000, 30, 50, 66]})

and how can I calculate how many children were born in each year ? for instance according to data frame above in 2000 year we had 20+500+1000 means 1520 new children.

Magofoco
  • 5,098
  • 6
  • 35
  • 77
dingaro
  • 2,156
  • 9
  • 29

3 Answers3

2

You can try:

my_final = df.groupby("Year")["Count"].sum()

print(my_final)

This will calculate the number of children per year.

Magofoco
  • 5,098
  • 6
  • 35
  • 77
  • Perfect, thank you, and how can I check in which year was the biggest numer of new children ? – dingaro Nov 23 '19 at 17:05
  • `df.groupby("Year")["Count"].sum().sort_values(ascending=False)` the top result is the one with the largest number. – Magofoco Nov 23 '19 at 17:08
  • Perfect, and the last question If you have a second, how can I show on the plot for instance top 5 years with the biggest numer of new children using for instance bar graph ? – dingaro Nov 23 '19 at 17:21
  • It is best first if you try by yourself. Check: matplotlib – Magofoco Nov 23 '19 at 17:22
0

This will calculate the new children in the year 2000:

df[df["Year"]==2000]["Count"].sum()
Josua
  • 59
  • 1
  • 5
  • 1
    This only computes one year instead of each year group. On top of that, using chained indexing is a poor choice. – cs95 Nov 23 '19 at 18:26
0

To get the highest number of children try following(as per OP's comments in comment section of 1 of the answers). This will give year in which highest number of children were born.

df.groupby('Year').agg({'Count': 'sum'}).reset_index().max()
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93