0

i have dataframe called df_civic with columns - state ,rank, make/model, model year, thefts. I want to calculate AVG and STD of thefts for each model year.

All years that are in dataframe are taken with: years_civic = list(pd.unique(df_civic['Model Year']))

My loop looks like this:

for civic_year in years_civic:
    f = df_civic['Model Year'] == civic_year
    civic_avg = df_civic[f]['Thefts'].mean()
    civic_std = df_civic[f]['Thefts'].std()
    civic_std= np.round(car_std,2)
    civic_avg= np.round(car_avg,2)
    print(civic_avg, civic_std, np.sum(f))

However output is not what i need, only output that is correct is the one from np.sum(f)

Now output looks like this:

9.0 20.51 1
9.0 20.51 1
9.0 20.51 1
9.0 20.51 1
9.0 20.51 13
9.0 20.51 15
9.0 20.51 3
9.0 20.51 2
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • Please include sample data and format your question according to tips provided in this post: https://stackoverflow.com/a/20159305 – navneethc Jan 06 '21 at 17:22
  • @Aleksander, you can use triple ``` code ```, to mark a code block over multiple lines. Took a while to edit your 100s of code blocks and
    s :) .. Also, you can simple move a new line to another line without using
    . Its allowed in markdown. Check how I edited your question to format your question better next time. Cheers.
    – Akshay Sehgal Jan 06 '21 at 17:22
  • Hi, sorry i'll use correct ones next time! – Aleksander Kuś Jan 06 '21 at 17:26

1 Answers1

1

Pandas provides many powerful functions for aggregating your data. It's usually better to first think of these functions before using for loops.

For instance, you can use:

import pandas as pd
import numpy as np

df_civic.groupby("Model Year").agg({"theft": ["mean", np.std]})

More doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html

Regarding your code, there is something weird, car_std and car_avg are not defined.

Nicoowr
  • 770
  • 1
  • 10
  • 29