1

I have a dataset with dates and data points for that specific date (d1, d2, d3, etc.) for every stock for each country. Some datapoints are missing for some stocks within each country and I want to replace them with average for those stocks in other countries

date  stock     d1  d2 d3 country
12.94 xyz corp  12   3  4 US
12.95 xyz corp  13 NaN  1 US
12.95 123 corp   3   4 12 US
12.94 abc corp   1   3  5 CA
12.94 abc corp NaN   3  4 CA

So, in above data point d2 for xyz on 12.95 needs to be replaced by average of d2 within US for 12.95

I would appreciate any insight on how to do that. I created an index of unique dates and planned on using pivot table where values iterate through various data points such as d1, d2, etc

cnt_avgs=rawdt.pivot_table(values=["d1",index=["country","],aggfunc=np.mean)
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
f1racer
  • 45
  • 5
  • 2
    you have 7 columns and 6 column names, please checkout this [post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) about asking question about pandas and update the dataframe – DJK Aug 19 '17 at 03:48

2 Answers2

0

I'm not exactly sure if this is what you are looking for. But you can iterate over all the NaN columns and then the missing value rows and substitute the missing values using numpy.mean and conditional pandas slicing:

convert list into a pandas dataframe:

df = pd.DataFrame(dt[1:], columns=dt[0])

Check and iterate over columns with NaN values. Then, for the columns that have NaN, iterate over the rows and change the data using numpy mean function and pandas conditional slicing:

for col in df.columns[df.isnull().any()]:
    for row in df[df[col].isnull()].iterrows():
        df.loc[row[0], col] = np.mean(df[(df['date'] == row[1]['date']) & (df['country'] == row[1]['country'])][col])
Vico
  • 579
  • 3
  • 13
0

IIUC, use groupby and fillna:

df.groupby(['date','country'],as_index=False)\
  .apply(lambda x: x.fillna(x.mean()))\
  .reset_index(drop=True)

Output:

    date     stock    d1   d2  d3 country
0  12.94  abc corp   1.0  3.0   5      CA
1  12.94  abc corp   1.0  3.0   4      CA
2  12.94  xyz corp  12.0  3.0   4      US
3  12.95  xyz corp  13.0  4.0   1      US
4  12.95  123 corp   3.0  4.0  12      US
Scott Boston
  • 147,308
  • 15
  • 139
  • 187